This guide provides concepts and principles which are best considered during the design of a Kubernetes environment (abbreviated as K8s in this document). It aims to provide the reader good advice on how to maintain large workloads in a secure manner. Where possible the document will remain vendor agnostic, it may however refer to specific cloud providers where useful to allow reader to broaden their knowledge of how versatile K8s is and what approach the cloud vendors have used to offer a more reliable solution. It is assumed that the reader has good understanding of containerisation and some knowledge of orchestration. This document should not be considered a step-by-step guide on how to build a K8s cluster, for this there are plenty of resources online. Instead it aims harness the great features K8s offers after the cluster has been stood up.
Once the cluster is up, the following areas need careful planning.
- Requests and Limits
- Labels and Selectors
- Network Policies
Think of namespaces as a virtual cluster inside the Kubernetes cluster. Namespaces are all isolated from each other. They can help the project with organisation, security and even performance. You can deploy multiple namespaces which are all independent of each other. This allows the teams to have resource with same resource name inside different namespaces allowing scalability inside a single K8s cluster. A typical out of the box installation comes with 3 default namespaces; default, kube-system, kube-public. It is advised to leave these namespaces as they are with the exception of default which can be used in a POC and non-production workload environment.
Although earlier it was mentioned that names spaces were isolated, this does not mean pods within different namespace cannot communicate with each other, they can. Built in service discovery and DNS allows you to refer to other namespaces.
Organisations often start out with a single cluster that handles all workloads, both production and non-production, with services being developed by a single team. In this case it makes sense to define namespaces based on environment, for example, namespaces for development, test and production with network policies (see below) to provide a measure of network isolation between the environments. Over time though, organisations often spin up separate clusters for the environments which offers genuine network separation and the potential to run larger workloads. In this case a good practice is to organise the namespaces around teams and/or functional areas, particularly where application development follows Domain Driven Design principles and occurs within bounded contexts. In this case create namespaces per bounded context.
Always organise your clusters into namespaces and don’t deploy pods into default. Use environments as the basis for namespaces where you only have a single cluster and base namespaces on team, product or functional / business area when environments have been allocated dedicated clusters.
REQUESTS AND LIMITS
Requests and limits are the features in K8s which control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, K8s will only schedule it on a node that can give it that resource. Limits, on the other hand, prevent a container to go above a certain value. The container is only allowed to go up to the limit, and then it is restricted.
K8s uses the concept of millicores as a measure of CPU . If your application needs a whole core, assign 1000 millicores to the container. If your container needs quarter of a core, assign 250 millicores to the container. One thing to note, if your manifest states that your container has say 8000 millicores, but largest node only has 4 cores, the scheduler will not be able to launch the container. Unless the application is designed to utilise multi cores, it is typically better to run multiple replicas of the containers with lower cores than larger containers but in less numbers. This introduces more flexibility and reliability into the cluster.
Memory is allocated in bytes where KB, MB, GB etc. are unit prefixes denoting Kilo, Mega or Giga bytes in decimal i.e multiples of 1000, or KiB, MiB, GiB are unit prefixes for binary, ie, multiples of 1024.
Always use Requests and Limits to prevent cluster running out of resources. Requests and limits along with Resource Quotas work very well. Whereby Resource Quotas are set at the Namespace level and request and limits at the container level. Example; in a 3-node cluster where the aggregate memory is 48GB and CPU cores is 36, a Resource Quotas, the Containers can be allocated 40GB memory and 30 CPUs.
LABELS AND SELECTORS
Labels are key – value pairs that are attached to K8s objects and are used to organise and select subsets of K8s objects. They allow users to map their own organisational structure onto their K8s clusters. Some example labels might be:
■ env: dev or env: prod
■ version: v1 or version: v2
In order to fully appreciate labels and selectors we need to understand the concept of Services. Once we understand this we can continue to look at labels. Every object in Kubernetes in particular pods should get labels. At high level:
■ Labels are metadata you can assign to any API object.
■ They are a grouping mechanism for pods
■ They are used by selectors
Let’s take an example where we have a back-end application in your K8s Cluster in the production environment and version 1.1, we label this as env: Prod, role:BE, version: 1.1. There are 2 replicas of this pod.
We then place a Service above these pods to allow the front-end application to discover the back-end pods.
We then use the same three labels on the Services object which essentially creates a filter stating “Find all pods with the labels env: prod, role: BE and version 1.1 and route traffic to these pods”. Here very quickly and simplistically we have tied the Services Object to the back end pods.
Taking the same example as above, we go ahead a place a new pod in the cluster with the labels Dev and 1.0. Because these labels do not match, the Services object will not route traffic to this pod.
Always use Labels and Selectors to tie your API objects together giving you both control and a way to quickly search for objects with common roles.
Kubernetes Secrets are another resource object which hold small amount of information such as passwords, tokens or keys. The benefit of this setup is that we avoid storing this sensitive information inside container images or configuration files. Once a pod is deployed it can reference secrets in these ways:
■ as files in a mounted volume
■ used by the node agent
■ environment variables to be used by a container in a pod
There is an issue with the default way that K8s stores since they are stored as base64 encoded plaintext, which means with the right privileges a hexdump of the secret can be obtained which is then very simple to decrypt.
Third Party Providers
Due to growth and popularity of K8s, cloud providers and third parties are exposing their services to be consumed by such products as K8s.
Microsoft Azure allows K8S to store its secrets within the Azure Vault. Vault fully encrypts all data and therefore even privileges users cannot view to decrypt the data. To use Azure keyvault, the yaml manifest will need to contain the endpoint: unix:///opt/azurekms.socket. and this will allow the pod to read/use/destroy keys.
Hashicorp provide a similar product which some subtle differences which can be seen here. The key difference between using this product over say Azure or Google Key Management Service is that you can run this anywhere which means you can run it on-prem if this is one of the requirements.
Always use Secrets and do not store the sensitive data on mounted volumes or files. Using a cloud providers offering or company like hashicorp is more robust as they have added more controls, features and security to ensure the sensitive data cannot be tampered with.
Store sensitive information, such as passwords, in K8s secrets.
In simple terms Network Policies control traffic from and to Pods. Network Policies use labels to select pods and define rules which specify what traffic is allowed to the selected pods. The default setting in K8s is to set pods as non-isolated which means ‘accept traffic from any pod in the cluster’. BBy default network policies are not applied and any pod can communicate with any other pod in the cluster. Applying network policies can give tight control over network communication. We can further define other constructs within Network Policies such as, which namespace, IP Range, and source destination ports. Thinking in traditional sense, one would never have a flat network with zero network ACLs, the notion should be no different in the containerised world.
One thing to note about Network Policies is that they require a networking plugin on the K8s nodes. Deploying the Network Policy without the plugin will have no effect.
An example of the plugins are:
Weave Net: https://kubernetes.io/docs/tasks/configure-pod-container/weave-network-policy/
Network policies are defined at namespace level and define ingress rules which define what traffic is allowed to pods in the namespace and egress rules that define what traffic is allowed from pods in the namespace. Traffic is allowed or disallowed based on matches on one of the following criteria
- Pods that match certain criteria such as having a set of labels
- Namespaces that match certain criteria such as having a set of labels
- Pods and namespaces that match certain criteria such as having a set of labels
- Blocks of IPs within the K8s network
Always use Network Policies when deploying K8s clusters. This gives you robust control over pod communication. When pod communication leaves the K8s cluster, the firewalls and security appliance outside of the cluster will take care of the traffic filtering. A good practice we have used is to, by default, deny ingress from, or egress to, pods outside of a namespace.