The first time I started looking into virtualization, docker, kubernetes and etc... it was for work... a year ago.
I happen to like it, to the point where I now managed to build a nice kubernetes cluster at home, which can always be improved (and will be improved).
If I say all that is because I want one thing to be very clear the main goal of this document is to help you build your homelab, and nothing more I won't go into details on how to setup a full blown production ready cluster... That said we won't be very far from that and it will be more than enough for all your homelab needs.
Second disclaimer english isn't my main language so if there is any mistake let me know.
Third disclaimer, this tutorial is made for linux and mac based environment on windows some commands might differ.
Last disclaimer, I consider here that you already have basic knowledge in IT, virtualization and containers so I take some shortcuts here and there.
Disclaimer I could use ansible a First things first we need machines, hardware, something that we can use, possibly destroy, mess with etc you get the idea. For this purpose I want to use virtual machines as it is easy to recover from an issue and avoid mistakes when we are learning. The advantage here is that once we know this setup works on a set of virtual machines we can just replicate this setup on real machines and it should be fine.
There is a TON of ways to setup virtual machines, to just name a few virtual box, vmware, proxmox, terraform are all names you should be familiar with. In 2020 one would setup everything with terraform and ansible as the tech is exactly made for what we are about to do, however in order not to make things more complicated for beginners I won't use it here for now. Feel free to setup your environment this way if you know how to do it, for everyone else let's continue.
So for this demo I will use vmware and setup everything manually, feel free to use virtualbox as it is completely free and does the exact same job, I just use vmware as I have a licence for it.
The first step here is to create three machines, completely identical (you can decide if you want more ram or cpu than what I show here it is up to you). Those three machines will be run our cluster.
Each machine I created is setup like this:
- OS Ubuntu server available at https://ubuntu.com/download/server (choose the manual install option and download the ISO, then use it to create your VM)
- 2 CPU Cores
- 4GB of ram
- 20GB of disk
It is clearly overkill for what we are doing, I have the hardware to handle it so there is no issue here, but your mileage may vary. Feel free to adjust each machine to your personal computer hardware. Also remember we will be running the three machines at the same time so make sure your computer can handle everything before starting.
Now you can install ubuntu server on each machine, it will ask you a bunch of questions about the machines just make sure to install openSSH during the installation process, so we can remote access the machine. This step is very important and necessary so pay attention to it when the software asks you ! AND SAY YES !
You can name each machine as you wish, just remember that one machine will be the master and the other two will be the slaves.
I names my machines like this:
- KubernetesMaster
- KubernetesSlave1
- KubernetesSlave2
Each machine will be given an IP by your virtualization platform of choice just make sure to take note of these IPs as they will be necessary in a minute.
At this point each machine should be up and running and you should see a nice login screen (by nice I mean literally just the word login but that's to be expected)
Now what we want to do is add a user that will be responsible to handle the cluster setup, for demonstration purposes we will create a user called k3 (spoiler alert if you are familiar with kube you know where this is going ;) )
In order to do so we must login into each machine and create a user. So we don't handle each machine in the virtualization interface I recommend you to get a terminal (command line on windows) which will make our life much easier. On mac I use iTerm2 but feel free to use what ever you want.
Now we have to remote access each machine from our terminal (here is where SSH and IPs comes into play):
ssh <username>@<machine-ip>
(Of course replace the values marked as with your own values)
Side note: to terminate an ssh connection you must type 'exit' into the terminal.
In order to create the user k3 we must do the following command (on each machine):
sudo adduser k3
This command will ask you your password and then will ask you the k3 user password, I've set mine to 'test' but you can choose whatever you want.
Now we must allow this user to have admin rights when necessary without having to type the password all the time (we won't have this for very long just for the initial setup, as it saves us some time). In order to do so you must do the following:
sudo visudo
You are now taken to a very important file (do not erase anything here), this file will allow k3 to be able to do sudo commands without password. What you want to do now is go to the very end and add the following text (again on all machines):
k3 ALL=(ALL) NOPASSWD:ALL
One last thing we must do, is to be able to ssh into our machines without having to type k3's password all the time when we connect to it via ssh. In order to do so, you must exit the ssh connection, by just typing exit in the terminal, this will kill the connection to that machine and bring you back to your original computer. From your computer you can now do (again three times) the following command:
ssh-copy-id k3@<machine-ip>
This will ask you some question and a password (k3's password), once everything goes well you can ssh into any of the three machines (as k3) without having to type k3's password.
OK ! So now we are at the point where things are ready ! We have three machines, all setup with a k3 account that can do pretty much everything on the machines (I know security wise it's not the best but those permissions will be revoked after the setup).
Kubectl is a tool that is used to access and run commands on a kubernetes cluster, it's primordial that you have this tool as it is THE tool to do pretty much everything on your cluster. You can follow this link in order to install the tool on your machine: https://kubernetes.io/docs/tasks/tools/install-kubectl/
The kubernetes flavour we will setup today is K3S, there is tons of other version out there but I found out after an extensive research that K3S has many advantages, first of all it is very easy to install, has a very small footprint and is backed up by rancher which is a big company in the kubernetes space. All that aside we also are lucky cause someone already made a pretty cool tool for us to use in order to install kubernetes with a very simple command.
The tool we will use for the job is k3sup (https://github.com/alexellis/k3sup) I invited you to check his readme but if you don't have time I will put here the important parts.
The first step is the installation on our computer (not on the VMs) of the k3sup tool to do so you need the following command:
curl -sLS https://get.k3sup.dev | sh
sudo install k3sup /usr/local/bin/
Once this is done we can now proceed and install k3s on one of our virtual machines, this machine will become the master so choose carefully. In order to install k3s you must run the following command:
k3sup install --ip <master-server-ip> --user k3 --k3s-extra-args '--no-deploy traefik'
Side note: as you can see here I do --k3s-extra-args '--no-deploy traefik' this is because I do not want to install the traefik that comes with k3s as it is version 1.7, and we are now at 2.3.X, which brings to the table bunch of amazing feature that 1.7 lacks.
You should have a message that lets you know that all is setup correctly and some commands written for you to test it out. Basically it tells you to export the kubeconfig file to a variable and then do
kubectl get nodes
Here you should see the node you just installed.
Now we must add the two other machines to our cluster so that we have a nice multiple node cluster running our things. K3sup comes in handy once again as there is a simple command to do just that:
k3sup join --ip <machine-to-add-ip> --server-ip <master-server-ip> --user k3
Do this command to both new machines that you want to add to your cluster.
Feel free to add as many machines as necessary I just did three in this example but technically you can go as high as your hardware allows you to.
(Side note: I won't use helm in this tutorial as I want to be able to understand and control everything that is happening. This approach helped me understand kubernetes instead of relying on the magic helm brings, I believe while learning this approach is better and once you fully understand what is happening feel free to use helm.)
We are almost at the end of this small doc (I imagine cause I still didn't write everything, if it continues for our don't blame me please)... Now we have a running cluster fully ready to take one what you want to put on it.
But we just need to add a couple of stuff for everything to be 100% ready.
(Be ready, here I will start talking in kubernetes terms if you don't understand those terms you will have to look into it in the official kubernetes documentation: https://kubernetes.io/)
The first thing we must have is called metallb, this tool allows you to create load balancer service on your cluster. Which will be relevant later when we tackle other software.
In order to install metallb there is a couple of commands we must use (versions may vary depending on when you are reading this feel free to change it in the links below if a newer version is available):
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.4/manifests/namespace.yaml
kubectl apply -f https://raw.githubusercontent.com/metallb/metallb/v0.9.4/manifests/metallb.yaml
# On first install only
kubectl create secret generic -n metallb-system memberlist --from-literal=secretkey="$(openssl rand -base64 128)"
Those files will create everything in order to use metallb in your cluster, EXCEPT one thing the configuration for it which you must do yourself. In order to make this configuration you must create a yaml file, with your text editor of choice, let's call it 'ips-configmap.yml'. Here is my configuration just remember to add your ip range to the file and change the name of the address-pool (those IPs will be attributed by metallb to your service so you should use IPs you know you can reach, in the case of your home network if your home is 192.164.1.X, you can for example give a range like 192.164.1.200-192.164.1.250, if you stay within this tutorial look at the ips your VMs have, if your VMs have for example ips like 123.123.123.12 you can do a range with 123.123.123.XXX - 123.123.123.YYY):
apiVersion: v1
kind: ConfigMap
metadata:
namespace: metallb-system
name: config
data:
config: |
address-pools:
- name: <default-name>
protocol: layer2
addresses:
- <start-ip>-<end-ip>
Save the configuration file and apply it to the cluster (to do the following command you must be in the same folder as your yml file otherwise you must add the path to it like /path/to/config/ips-configmap.yml):
kubectl apply -f ips-configmap.yml
If it says that the configmap was created you are golden ;)
The solution we will use for this is called Traefik, it is pretty much bullet proof, used in the industry by tons of companies and free, which makes it in my honest opinion the best tool for the job, once you understand the tool and once it is setup you can forget about it, it just work.
So onto traefik. To do so we will need to add some configurations to our cluster. Those are objects that traefik understands and needs in order to function properly. Here is my file you can copy paste it, let's name this file 'ingressRouteDefinition.yml':
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: ingressroutes.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: IngressRoute
plural: ingressroutes
singular: ingressroute
scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: middlewares.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: Middleware
plural: middlewares
singular: middleware
scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: ingressroutetcps.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: IngressRouteTCP
plural: ingressroutetcps
singular: ingressroutetcp
scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: ingressrouteudps.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: IngressRouteUDP
plural: ingressrouteudps
singular: ingressrouteudp
scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: tlsoptions.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: TLSOption
plural: tlsoptions
singular: tlsoption
scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: tlsstores.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: TLSStore
plural: tlsstores
singular: tlsstore
scope: Namespaced
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: traefikservices.traefik.containo.us
spec:
group: traefik.containo.us
version: v1alpha1
names:
kind: TraefikService
plural: traefikservices
singular: traefikservice
scope: Namespaced
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: traefik-ingress-controller
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- secrets
verbs:
- get
- list
- watch
- apiGroups:
- extensions
- networking.k8s.io
resources:
- ingresses
- ingressclasses
verbs:
- get
- list
- watch
- apiGroups:
- extensions
resources:
- ingresses/status
verbs:
- update
- apiGroups:
- traefik.containo.us
resources:
- middlewares
- ingressroutes
- traefikservices
- ingressroutetcps
- ingressrouteudps
- tlsoptions
- tlsstores
verbs:
- get
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
name: traefik-ingress-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: traefik-ingress-controller
subjects:
- kind: ServiceAccount
name: traefik-ingress-controller
namespace: default
As usual we must now apply this to the cluster, same as before:
kubectl apply -f ingressRouteDefinition.yml
Here you will see that bunch of stuff have been created, that's normal and to be expected.
Now we must create a deployment and a service account for our traefik for it to work, you can do soo by applying again another yaml file (everything is yaml in kube) Here is the file just make sure to check the comments and change according to your needs, also note that I am installing traefik in the default namespace you can change it if you want make sure to create a namespace beforehand. Let's call this one traefik.yml:
apiVersion: v1
kind: ServiceAccount
metadata:
namespace: default
name: traefik-ingress-controller
---
kind: Deployment
apiVersion: apps/v1
metadata:
namespace: default
name: traefik
labels:
app: traefik
spec:
replicas: 1
selector:
matchLabels:
app: traefik
template:
metadata:
labels:
app: traefik
spec:
serviceAccountName: traefik-ingress-controller
containers:
- name: traefik
image: traefik:v2.3 # you might have a higher version by the time you reading this
args:
# This should not be setup when you are in production as it creates a dashboard that can be accessed by anyone
# but for our test needs it is great just remember to remove it
- --insecure.api
- --accesslog
# Here we define our entry points we have two of them one at 80 (we call it web) and one at 443 (we call it websecure)
- --entrypoints.web.Address=:80
# Traefik handles automatic redirections from http to https and it's done like so
# feel free to comment that in the first time if you want to test your http endpoint
- --entrypoints.web.http.redirections.entryPoint.to=web-secure
- --entrypoints.web.http.redirections.entryPoint.scheme=https
- --entrypoints.web.http.redirections.entrypoint.permanent=true
- --entrypoints.web-secure.Address=:443
# I still need to read about providers but basically we need that
- --providers.kubernetescrd
# This part is the part that will generate our ssl certificates
# I invite you to read a bit more about this, you will need your own domain name in order to use it
# Traefik has a nice documentation on the different options but in the meantime here is what I used
- --certificatesresolvers.certresolver.acme.tlschallenge # many challenges exists you must see what you prefer/need
- [email protected] # replace this with your mail
# This file will store our certificates, I do not use a volume to store this so everytime traefik reboot it will destroyed
# this is fine for our dev purposes but you might want to have a volume when you go live with your cluster
# it is important for you to know that we will use letsencrypt and it has restrictions on the amount of certificates we can ask for
# if you ask more then X certificate for test.domain.com you will be throttled and will have to wait to have your certificate
# here we are going to use the staging server so we can make sure the certificate validation works and then we will remove the staging and go for real certs
- --certificatesresolvers.certresolver.acme.storage=acme.json
# here we setup who will give us certificates we chose the staging server as explained above as we want to make sure it works first
# to have real certificate remove staging from the link below
- --certificatesresolvers.certresolver.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory
# here we are opening ports in order to access later traefik
# we have the usual 80 and 443 but also 8080 this last one is where the traefik dashboard is deployed
# for testing it is ok but remember we set this as insecure so in the future we must protect this or simple not make it acessible
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
- name: admin
containerPort: 8080
And once again we must apply that:
kubectl apply -f traefik.yml
Now traefik should be running (validate it with kubectl get deployment to make sure) but it is not yet accessible to you, it's still trapped inside the cluster. In order to access it we must have a service (of type loadbalancer here is where metallb comes in handy). Here is the service file let's call it service.yml:
apiVersion: v1
kind: Service
metadata:
name: traefik
annotations:
metallb.universe.tf/address-pool: <name-of-your-ip-pool-defined-above>
spec:
ports:
- port: 80
targetPort: 80
name: http
- port: 443
targetPort: 443
name: https
- port: 8080
targetPort: 8080
name: admin
selector:
app: traefik
type: LoadBalancer
You know the drill we must apply the little fellow:
kubectl apply -f service.yml
Now the magic is almost about to happen !
Ok now we have a setup where we can in fact create our certificates for our services and so on, in order to keep it simple however we will use an image that is already available to us. We will setup now nginx (you don't really need to know what it is) on our custom domain and validate the certificate.
Bunch of things we must do first though. Let say we want to expose test.example.com we must say to our computer to redirect test.example.com to the virtual machine (the entry point of our cluster), in our case it is the service we just created for traefik ! By running this command:
kubectl get service
You should see a line like so:
traefik LoadBalancer 10.43.79.253 <the-ip-we-want> 80:31465/TCP,443:32256/TCP 21h
Notice the second IP this is the IP of the service that redirects to traefik, aka what we want.
So now you must edit your host file so that test.domain.com redirects to this IP. You can check it here: https://www.howtogeek.com/howto/27350/beginner-geek-how-to-edit-your-hosts-file/
Once this is done if you try to access it... you won't see anything, thats expected :D But you should be able to access :8080 and see traefik alive if this is ok you are golden !
Now what we must do is add something that answers when we query test.domain.com and not just a blank 404 page.
Here we go installing nginx by doing the following:
First we must create a deployment let's call it ndeploy.yml:
kind: Deployment
apiVersion: apps/v1
metadata:
namespace: default
name: nginx
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- name: http
containerPort: 80
Apply it with:
kubectl apply -f ndeploy.yml
Now we must have a service so that the service can take us to that deployment, let's call it nservice.yml:
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
ports:
- name: http
port: 80
selector:
app: nginx
Finally we must tell traefik that we want to access this service when we reach test.domain.com this is done via a route, there is much more information on the traefik website so you can also check it out there if this is complicated for you. Let's call this the nroute.yml:
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: nginx-route
namespace: default
spec:
entryPoints:
- http
routes:
- match: Host(`test.domain.com`)
kind: Rule
services:
- name: nginx-service
port: 80
---
# Here we are defining two routes one in http and another one in https
# If you have kept the global http to https redirecting you don't need the http route
# as all traffic will be redirected automatically
# I just left it here for you to see the difference between the two
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
name: nginx-route-secure
namespace: default
spec:
entryPoints:
- https
routes:
- match: Host(`test.domain.com`)
kind: Rule
services:
- name: nginx-service
port: 80
tls:
certResolver: certresolver
What we are saying here is simple, we tell traefik that we want to reach the nginx-service when we arrive with test.domain.com, in http or https
Apply it with:
kubectl apply -f nroute.yml
Now if everything has been done correctly you can now open your browser and go to test.domain.com
You will see a warning that's because the certificate will not probably work as you probably don't own test.domain.com and that is normal but you can bypass this warning and reach your site
Now you must just confirm that everything is working and remove the staging server from traefik and replace it with the production server so you can start having your own certificates.
If you don't have a domain name, you can also decide not to implement the certificate resolver by removing the arguments from the traefik deployment, doing so will not resolve certificates and you will have warnings when acessing your site but as soon as your hosts are configured to reach traefik service correctly within your local domain you can have your own urls for your own apps and everything will run just fine.
Just a tip don't bother with https if you don't have to have certificates, or if your cluster is never opened on the internet, you can run everything on http and have your own urls as well, just remember there won't be encryptions between you and your app if that's what you decide to do.
OK so in this update we will add volumes to our pods so that we can keep data on disk if we restart our pods. If yu followed along this tutorial of some sort you have now a traefik instance that automatically gets you a certificate once a new route is created. This is super convenient but once you remove the "staging" let's encrypt server you will see that you might reach a certificate limit, aka you asked too many times for the same certificate.
That happens because as you turn off your traefik instance (for whatever reason) and run it again it will ask again and again for the certificate for the various routes you created.
Reaching your limit very quickly.
In order to avoid that we will create three things (and again we will do so without any automation so we really understand what is happening):
- A volume
- A volume claim
- A traefik deployment with updated parameters
Ok so let's go do that ! BUT before that, I am not entering here into the NFS territory I suppose you already have an NFS server with users that have the right to use a specific volume of your NFS server. Considering the following for the rest of this tutorial:
- You have an NFS server with a folder in it that you have access to
- You have a user ID and GID at hand and this user has permissions to use the NFS share
In my case I created a volume on my nas and gave a user "kube" the right to access this folder, this kube user has an ID of 1040 and a GID of 200, but of course yours may vary, and thos values are random invented for this tutorial. I also made my NFS share accept direct connections from my kubernetes IPs and local machine IPs so I can access the NFS share directly without being bothered by access codes. This vary per NAS distribution so I can't really help you setting this up, but a quick search with "NFS on my XXX nas" should help you.
OK so now we can tackle the kube side of things.
First thing first we will create a volume, this object is the one mapping your NFS share and making it accessible on kubernetes. In my example below I created a volume called traefik-data-pv and gave it a capacity of 5gigs. I also made sure I have a "readWriteOnce" option which means only one claim can be made on that volume (more on that later). I also flyover storage class here and use the default, you may want to read more on that but let say here that I don't have different kinds of storage usually you would use that create a class based on SSD and HDD for instance so that some volumes are automatically stored on SSD, this kind of things. Imagine that as premium storage, basic storage, medium storage and so on. Not so relevant for me though. The interesting part is the "nfs" options where you specify a directory in my NFS volume and the IP of my NFS server.
## PV traefik data
apiVersion: v1
kind: PersistentVolume
metadata:
name: traefik-data-pv
spec:
capacity:
storage: 5Gi
storageClassName: "local-path"
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
mountOptions:
- hard
nfs:
path: "/vl3/Kubernetes/traefik" # insert here your NFS path where traefik will save data
server: "XX.XX.XX.XX" # insert here your NFS server IP
Now that is done we can create a "claim" an object that will basically use the volume, this claim makes some kind of a bridge between the volume and a kubernetes pod. Think of it as inserting an usb drive to the pod itself in some way. Note that the claim must match the volume created you can't have a claim that ask for bigger storage then a volume. You can however have a claim that ask for less storage but remember a volume can only support one claim so better use all the storage.
Here is my claim (also note that it references the previous volume):
## PVC traefik data
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: traefik-data-pvc
spec:
volumeName: traefik-data-pv
resources:
requests:
storage: 5Gi
storageClassName: "local-path"
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
Now as usual we can save thos definition in a file like "volume.yml" and apply that with a good old:
kubectl apply -f volume.yml
Now the important thing is if you now do:
kubectl get pvc
You should see your claim with a status "Bound" this means the claim found the volume and the volume is ok ! AKA you are on the good direction.
Now for the final step we want to make sure our traefik is saving the certificates on that volume. Here is an updated version of the traefik deployment file that does just that: (I removed the previous explanation comments written previously so we can focus on the volume part of things feel free to scroll up for more info on labels)
---
kind: Deployment
apiVersion: apps/v1
metadata:
namespace: default
name: traefik
labels:
app: traefik
spec:
replicas: 1
selector:
matchLabels:
app: traefik
template:
metadata:
labels:
app: traefik
spec:
# now this security context will vary according to how you have setup things on your nfs share.
# if you created a share and said "this IP is allowed to do everything" then you don't need the security context
# if on the other hand you said "no only this user is allowed to do something" then you must tell your pod to act as this user
# otherwise you will run into security issues such as "no permission allowed" when you will try to save data on the NFS share
# in my case I have a double safety I say this IP and this account only can use my share so I must specifiy here the account "kube" I created earlier on my nas
securityContext:
runAsUser: 1040 # my kube user ID
runAsGroup: 200 # my kube group ID
serviceAccountName: traefik-ingress-controller
containers:
- name: traefik
image: traefik:v2.3
args:
- --accesslog
- --entrypoints.web.Address=:80
- --entrypoints.web.http.redirections.entryPoint.to=web-secure
- --entrypoints.web.http.redirections.entryPoint.scheme=https
- --entrypoints.web.http.redirections.entrypoint.permanent=true
- --entrypoints.web-secure.Address=:443
- --providers.kubernetescrd
- --certificatesresolvers.certresolver.acme.tlschallenge
- [email protected]
# Here if I would have left acme.json Kubernetes would have seen this as a folder even if I have a sub path onmy volume mount so I decided to
# put the acme.json file in a folder called certs at the root of this container and it works just fine
- --certificatesresolvers.certresolver.acme.storage=/certs/acme.json
# Careful here I removed the staging server and put the real one cause I already know this is working for me
# make sure you have staging first so you can test things
- --certificatesresolvers.certresolver.acme.caserver=https://acme-v02.api.letsencrypt.org/directory
ports:
- name: http
containerPort: 80
- name: https
containerPort: 443
# Here is the interesting part I am here defining a mounting point so that in my cluster the path /certs is actually mounted on the nfs-data volume
# which I add to my deployment later on
volumeMounts:
- name: nfs-data
mountPath: /certs
# Here is the volume nfs-data and as you can see I reference here my volume claim that I created earlier.
# nothing really fancy must be done once you understand it is quite straight forward
volumes:
- name: nfs-data
persistentVolumeClaim:
claimName: traefik-data-pvc # the name of the claim you created earlier
OK so at this point once you "kubectl apply -f" this new traefik deployment you should see it starting up and if your permissions and account are setup correctly you will see your acme.json pop in your NFS folder.
If this is not the case check the logs of traefik as they are rather explicit:
kubectl logs -f <traefik pod>
OK so now not only your have a multi node kubernetes cluster but also a reverse proxy with ssl certificates AND NFS storage ! You are pretty much rock solid to start whatever apps you might want !
In this section the goal is to setup an SSO (single sign on) with two factor authentication for every services that you want to protect. This has some advantages, for instance if you want to deploy an app like for instance Homer (a dashboard to present different links) that doesn't come with a built in authentication mechanism you don't have to worry about it, cause something else is handling the authentication for you.
Doing so you have a software running to let other services know you are authenticated or not and you can happily disable if you want the login screen of your other services because this is not required now.
The flow that we want to put in place is the following:
- You want to access home.domain.com (let say a homer instance)
- You are first redirected to https via what we put in place earlier.
- Once in https you are then redirected to an authentication server.
- This server is checking if you are logged in, if you are you don't even see anything and you reach directly your initial address
- If you are not however you are asked to login
- After a successful login you are redirected to the initial address
This however has some pre-requisites that are important:
- You don't necessarily need a domain, however if you don't you need to change your host files so that example.domain.com and whatever.domain.com are redirect to different apps, in this tutorial we will setup an authentication server on auth.domain.com and a test app on app.domain.com, so make sure you have that ready !
What tools we will use for the job... There is many tools that are able to do this job ! I went through a lot and I figured that the easiest for us in a homelab environment is called Authelia.
Authelia is excatly what we need for the job, it's fast, secure and even though it is not as huge as a keycloak server for instance, it also means that it's way faster to run it and easier to understand, which for a homelab is perfect ;)
So as usual with every app that we want to run with kube we have to add a couple of things here, services, deployments etc you know the drill you just have to kubectl apply the different files, so here we go.
We start first with a namespace, we will use that to keep everything related to authentication inside its own space:
apiVersion: v1
kind: Namespace
metadata:
name: authentication
Then we need a service so we can reach our application:
apiVersion: v1
kind: Service
metadata:
namespace: authentication
name: authelia-service # give it the name that you want
spec:
ports:
- name: http
port: 9091 # our service exposes here the port 9091 (it's redundant to specify both but for sake of understanding here I do it all)
protocol: TCP
targetPort: 9091 # our service targets the port 9091 from the container running in the back
selector:
app: authelia # we want to access authelia
Once we got that, we need volumes and the associated volume claims in order to keep data on our NFS server that we managed to use on the previous steps of this tutorial, we have two volumes and two volume claims, in this example I will save the config and the secret in two different places but you can choose otherwise if you want to:
apiVersion: v1
kind: PersistentVolume
metadata:
namespace: authentication
name: authelia-config-pv # you can specify another name if you want
spec:
capacity:
storage: 10Gi # 10 gigs may be too much you might decide to give it less
storageClassName: "local-path"
volumeMode: Filesystem
accessModes:
- ReadWriteOnce # specify here that only one claim will be able to use this volume
mountOptions:
- hard
nfs:
path: "your/path/to/volume/authelia/" # replace here with the destination folder where authelia will store data (inside your NFS server)
server: "XX.XX.XX.XX" # replace here with your server IP
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: authentication
name: authelia-config-pvc
spec:
volumeName: authelia-config-pv # here as usual we reference the volume we created previously
resources:
requests:
storage: 10Gi # and we make sure that they both have the same size
storageClassName: "local-path"
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
---
# the volume and claim below are used to store the "secret" files in order to setup authelia without revealing secret informations
apiVersion: v1
kind: PersistentVolume
metadata:
namespace: authentication
name: authelia-secret-pv # you can always change those names
spec:
capacity:
storage: 10Gi
storageClassName: "local-path"
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
mountOptions:
- hard
nfs:
path: "/path/to/your/secret/location"
server: "XX.XX.XX.XX" # your NFS server IP
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
namespace: authentication
name: authelia-secret-pvc
spec:
volumeName: authelia-secret-pv
resources:
requests:
storage: 10Gi
storageClassName: "local-path"
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
Now that this is done we just need a couple more things, like the deployment:
kind: Deployment
apiVersion: apps/v1
metadata:
namespace: authentication
name: authelia
labels:
app: authelia
spec:
replicas: 1
selector:
matchLabels:
app: authelia
template:
metadata:
labels:
app: authelia
spec:
securityContext:
runAsUser: 1040 # this as seen previously is the NFS kube id we saw in previous sections (yours will vary of course)
runAsGroup: 200 # this is kube group id also seen previously
containers:
- name: authelia
image: authelia/authelia # here we just take the latest version
env:
- name: AUTHELIA_JWT_SECRET_FILE
value: /app/secrets/NFS-JWT-FILE # here we specify were is the jwt token required by authelia, the part after secret is present in your NFS server, this means you must have a file under your secret location defined above in the volume which is named NFS-JWT-FILE (of course you can change the name) this file must only contain the following: jwt_secret=your-super-secret-key-that-you-must-define-by-yourself
- name: AUTHELIA_SESSION_SECRET_FILE
value: /app/secrets/SESSION-FILE # same as above but for the session this file must contain: secret=your-super-secret-blablabla-you-got-it
ports:
- containerPort: 9091 # we open here port 9091
name: http
volumeMounts:
- name: nfs-data
mountPath: /config # we map the volume required for the config file
- name: nfs-secret
mountPath: /app/secrets # we map here the volume required for the secrets
volumes:
- name: nfs-data
persistentVolumeClaim:
claimName: authelia-config-pvc # claimes defined previously
- name: nfs-secret
persistentVolumeClaim:
claimName: authelia-secret-pvc
Now don't apply it just yet.
Now we have to create a configuration file for authelia that we must store at the root of our config volume and name it configuration.yml
There is an example of a configuration file here: https://github.com/authelia/authelia/blob/master/compose/local/authelia/configuration.yml I also invite you to read this: https://www.authelia.com/docs/configuration/ it describe every options available to configure authelia according to your needs.
As for myself for instance I decided to use a postgres database to store data and a redis database to store sessions, as everything can be different according to each person I will let you decide what is best for you and your cluster.
Once you have this figured out you can apply the deployment and see if it boots up correctly (again via kubectl get deployment -A)
The next step is to create a traefik route so we can access our authentication server via something like auth.domain.com Here is how you do it:
kind: IngressRoute
apiVersion: traefik.containo.us/v1alpha1
metadata:
name: authelia-route-secure
namespace: authentication
spec:
entryPoints:
- https
routes:
- match: Host(`auth.domain.com`) # of course change it according to your domain
kind: Rule
services:
- name: authelia-service # we want to reach authelia service
port: 9091 # on port 9091
tls:
certResolver: certresolver # we want automatic SSL as describe in past sections
Ok so by now we have almost every pieces of the puzzle, we just need to let traefik know what to do. This is called forward auth.
Forward auth works in a very simple way, in this example as we want to secure specific services we will create first a middleware and apply it on different traefik routes, or you can decide to apply it directly to every route by applying this middleware to the https endpoint. You decide. Let me show you how to specifically secure one endpoint.
First as we said we need the middleware:
apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
name: authelia # here I am leaving it in the default namespace but you can change it if you want to
spec:
forwardAuth:
# the address is a bit tricky, first we want our forward auth to redirect to the service we created earlier which was named authelia-service
# but this service is in another namespace
# in kubernetes to reference a service in another namespace you must an internal DNS name which is written like so:
# service-name.namespace.svc.cluster.local
# this is why you see this complicated url under address
# also we target the port 9091 and the path required by authelia /api/verify?rd=TheURLWhereAutheliaIsRunning
# took me a long time to understand this but yeah you need all that
address: http://authelia-service.authentication.svc.cluster.local:9091/api/verify?rd=https://auth.domain.com/
trustForwardHeader: true
# authelia requires those options to work properly basically you pass on arguments from your users when they login back to your apps
# you can add more stuff if you like but check the authelia documentation for more info
authResponseHeaders:
- Remote-User
- Remote-Groups
- Remote-Name
- Remote-Email
OK so by now we have all but one piece of the puzzle here !
We just need to secure a traqefik route so that we pass by this middleware, here I am taking a random route that redirects to a homer dashboard, of course you must have the dashboard actually running and reachable for this to work
Here is an example of the new route:
kind: IngressRoute
apiVersion: traefik.containo.us/v1alpha1
metadata:
name: homer-route-secure
namespace: dashboard
spec:
entryPoints:
- https
routes:
- match: Host(`home.domain.com`)
kind: Rule
services:
- name: homer-service # we redirect to a hypothetique homer service
port: 8080 # on port 8080
middlewares: # here is the important part we add to this route the middleware we just created above the name is
# namespace-nameOfTheMiddleware@kubernetescrd the kubernetescrd is because we have setup traefik with this argument --providers
# kubernetescrd
- name: default-authelia@kubernetescrd # the middleware authelia in the default namespace
tls:
certResolver: certresolver
Finally now we have if you've done everything correctly you can access homer and see the authelia login screen ! You can login and then be redirected to your homer dashboard !
This documentation is a draft as of today, I didn't put any picture yet didn't correct the language etc this is a work in progress and maybe will be improved further eventually maybe hopefully.........
However I had a goal not so long ago when I had no idea about containers and docker and kubernetes and my goal was to help you guys not to struggle like me and spend months and months (thanks covid for the free time :/) to figure out everything by yourself... I hope it helps you and if you have any problems you can reach me here via an issue or on my reddit https://www.reddit.com/user/SirSirae feel free to message me.
Summing it up we now have a multi node kube cluster, a reverse proxy with automatic ssl and https redirection, an SSO to secure every services we want to serve on our cluster ! Now you can pretty much do whatever you want on your cluster and host whatever apps you want they will all be secured !
Achievement unlocked: understand kubernetes before the end of 2020 !!!!!