Vanilla Kubernetes - High level design
This guide will try to describe, in short and from my own view, a vanilla kubernetes cluster. As vanilla as possible at least, and as simple as possible. I felt this was a good starting point to learn about k8s.
Everything I learn is also codified into this Ansible playbook repo that I use to get started with a vanilla kubeadm-based cluster.
Everything will be listed somewhat in a chronological order of which service or concept you should learn about first.
The goal of this guide is to have a working three node k8s cluster with a deployed application accessible from the host system or parent network.
This assumes you're already familiar with;
- Containers
- Docker
- cgroup in Linux kernel
- Podman to some extent (the fact that docker isn't the only cgroup interface)
- Basic application development (Flask/NodeJS Express quickstart guides would be enough)
- How the most basic Dockerfile works
- The purpose of docker registry servers
- Using Ansible and Vagrant
- IPtables and general Linux networking like routes, pseudo NICs and forwarding
General Terminology
- k8s is just an alias for Kubernetes.
- Control plane is the master server in a k8s cluster.
- Worker is a node in a k8s cluster.
- Pod is a unit of 1 or more containers that need to share resources. For example if a service needs a local redis you could run one in its own container, in the same pod as the service container. They can also share a data volume.
Setup
I've made myself these Ansible playbooks hosted on Gitlab.com to setup a kubeadm cluster. They've been extensively tested on Vagrant VMs and deployed on actual VMs using CentOS 7.
Follow the instructions in the README file to get a working cluster in your own Vagrant. As of writing it only supports libvirt because I run it on a Linux host.
Once complete you should be able to login to the master node and run some kubectl commands to verify that it works.
$ kubectl get pods -A
Setup terminology
- kubeadm is the tool used to init a new cluster, or join new worker nodes.
- kubelet is the k8s service running on all nodes, master and worker.
- kubectl is the CLI communicating with the k8s API to perform lookups and changes in the cluster.
- kubernetes-api is running as a pod in the cluster and controls the cluster.
See also
Deployment
I'll be using my own flask boilerplate example service to deploy, its source repo is here on Gitlab.com and its docker image is here on the hub.docker.com registry.
Deployment terminology
- Deployment is a way to start a pod in the k8s cluster.
- A deployment specifies a container image, a name, labels and a container port for example.
- Label is a way to find and use objects inside k8s.
- Service is a way to expose a deployment to the cluster using a proxy and port forwarding.
- Namespace is a way to group many things like deploys, services, ingress and more together.
- The big advantage of k8s is that you can apply a deployment manifest from a build pipeline and k8s will update the container image used to the version specified in your manifest.
Namespace
Make a habit out of creating namespaces for your deployments, this makes network policy easier later when you want to isolate them from other deployments.
$ kubectl create namespace flask-boilerplate
Or using a yaml manifest.
apiVersion: v1
kind: Namespace
metadata:
name: flask-boilerplate
labels:
name: flask-boilerplate
Labels
Labels are very important, that's how a service knows which deployment it's supposed to use. Or how an ingress knows which service and deployment it's routing traffic for.
So while object names can differ, like a deployment or a service object, the label must be correct.
Labels are set in metadata sections of manifests, read more about manifests later.
And in selector sections you find labels that match your query.
So when deploying an application you set its label in the metadata section. Then in all subsequent objects like service and ingress you query that same label.
Manual deployment
This pulls a docker image and deploys it in a pod of your cluster.
$ kubectl create deployment flask-boilerplate --image=docker.io/stemid/flask-boilerplate -n flask-boilerplate
You can also try it with ''--dry-run -o yaml'' to see what a yaml manifest might look like, more about using manifests under Deploy manifest.
Now you can use ''kubectl get deploy'' or ''kubectl get pods'' to see the results of your deployment.
Here's how you create a service to expose the pod to your cluster.
$ kubectl create service nodeport flask-boilerplate --tcp=80:5000 -n flask-boilerplate
Now you can see the internal cluster IP designated to your service.
$ kubectl get service -n flask-boilerplate
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
flask-boilerplate NodePort 10.108.194.8 `<none>` 80:31472/TCP 11m
And test the app you deployed there.
$ curl -sLD - 'http://10.108.194.8/api/v1/health'
HTTP/1.0 308 PERMANENT REDIRECT
Content-Type: text/html; charset=utf-8
Content-Length: 275
Location: http://10.108.194.8/api/v1/health/
Server: Werkzeug/1.0.0 Python/3.7.7
Date: Sun, 15 Mar 2020 10:52:21 GMT
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 16
Server: Werkzeug/1.0.0 Python/3.7.7
Date: Sun, 15 Mar 2020 10:52:21 GMT
{"status":"OK"}
This means that IPtables rules and routes setup by Calico (during the Ansible run) are routing your traffic from any k8s node to the correct pod and container.
If you're curious about where this pod is running its container you can either login to every worker and run ''docker ps -a'' until you find it. Or check under Container ID in the output of ''kubectl describe pods'' and then grep for that container ID (the first 6-8 chars) on all the workers. That's the only method I'm aware of at this moment.
Delete deployment
$ kubectl delete service flask-boilerplate -n flask-boilerplate
$ kubectl delete deployment flask-boilerplate -n flask-boilerplate
This will shutdown pods and containers until nothing is left.
Deploy manifest
A more automated way of deploying apps and services is to define them in yaml manifest files.
Here are manifests for the commands entered manually above. One for the namespace, one for a deployment where I define a docker image, and one for a service where I define which port to use in the deployed container.
Note that you can get a bit more creative with labels here. Previously the label has defaulted to the name of the deployment or service but here we can use one name for the deployment object, and one for the label.
apiVersion: v1
kind: Namespace
metadata:
name: flask-boilerplate
labels:
name: flask-boilerplate
apiVersion: apps/v1
kind: Deployment
metadata:
name: flask-boilerplate-deployment
namespace: flask-boilerplate
labels:
app: flask-boilerplate
spec:
replicas: 1
selector:
matchLabels:
app: flask-boilerplate
template:
metadata:
labels:
app: flask-boilerplate
spec:
containers:
- name: flask-boilerplate
image: docker.io/stemid/flask-boilerplate:latest
ports:
- containerPort: 5000
Note the use of nodePort: 30080
here which exposes TCP/30080 on each node, connecting you directly to your deployed service. This is done via kube-proxy.
apiVersion: v1
kind: Service
metadata:
name: flask-boilerplate
namespace: flask-boilerplate
spec:
type: NodePort
selector:
app: flask-boilerplate
ports:
- protocol: TCP
port: 80
targetPort: 5000
nodePort: 30080
Here's an example of how to apply these manifests and re-produce the results from the previous manual deployment.
$ kubectl apply -f flask-boilerplate-namespace.yaml
$ kubectl apply -f flask-boilerplate-deploy.yaml
$ kubectl apply -f flask-boilerplate-service.yaml
$ kubectl get svc -n flask-boilerplate
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
flask-boilerplate NodePort 10.96.86.216 <none> 80:30080/TCP 22m
Start a shell to test your service
Now you can test the internal connectivity using a curl shell.
$ kubectl run --generator=run-pod/v1 --image=curlimages/curl:7.69.1 -it curlshell -- /bin/sh
If you don't see a command prompt, try pressing enter.
/ $ curl -sLD - 'http://flask-boilerplate.flask-boilerplate/api/v1/health'
HTTP/1.0 308 PERMANENT REDIRECT
Content-Type: text/html; charset=utf-8
Content-Length: 321
Location: http://flask-boilerplate.flask-boilerplate/api/v1/health/
Server: Werkzeug/1.0.0 Python/3.7.7
Date: Thu, 16 Apr 2020 20:28:10 GMT
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 16
Server: Werkzeug/1.0.0 Python/3.7.7
Date: Thu, 16 Apr 2020 20:28:10 GMT
{"status":"OK"}
This demonstrates the DNS names you can use inside your service mesh to contact any service. The mesh is naturally a flat network structure and DNS names are usually service-name.namespace.svc.cluster.local but if you omit cluster-name it will be the default search domain. So flask-boilerplate.flask-boilerplate is in fact the DNS name of the service I deployed.
Attach a shell to existing pod
If you run the above command, exit and want to re-attach to this shell without deleting the pod.
$ kubectl run -it curlshell -- /bin/sh
Exposing this service
Now you could expose the service by simply load balancing all the nodes in an nginx pool. Or you could use an ingress controller and a k8s approved load balancing service.
Delete deployment
Now delete them, note the different names.
$ kubectl delete service flask-boilerplate
$ kubectl delete deployment flask-boilerplate
$ kubectl delete namespace flask-boilerplate
See also
Service
- Everything that listens and you want to access should have a Service definition.
- To only be available internally in your service mesh a ClusterIP type Service definition is enough.
- NodePort is a service type that binds a port on the host node to your service.
- Only things that you aim to expose outwards should have a Service definition of type NodePort.
- NodePort services automatically are assigned a high port like 32000 on the host node.
- In my setup it only makes sense to expose my ingress controller as a NodePort Service.
- You can deploy a Service object the same way as shown above.
Ingress
- Ingress is used to assign HTTP routes or host headers to specific services inside your cluster.
- The ingress controller is often the only service listening externally on each node through a NodePort service.
- ingress-nginx is the ingress controller I use, best for HTTP services.
- Traefik is another TCP proxy that can act as ingress controller for other protocols than HTTP.
- An ingress definition defines how the controller should recognize the incoming traffic and route it to the correct service.
- Ingress controller does not require any special load balancer like MetalLB.
- Any basic kubernetes cluster should consist of an ingress controller and hide its services behind it.
Access control
RBAC
- This is Role Based Access Control.
- One common use is to generate users and tokens for CI/CD jobs that need to deploy services to a cluster.
CI/CD Access tokens
- The recommended method would be to use the admin role in a specific namespace.
- That token would have full access to everything inside that namespace.
See also on Access Control
Upgrading
- So far I've had great success with the official kubenetes upgrade docs.
- Would highly recommend having a staging cluster to test upgrades on.
- I lock all the kubernetes packages to a specific version so they're not accidentally upgraded.