Azure Kubernetes node pools with Terraform
In Azure Kubernetes Service (AKS), nodes of the same configuration are grouped together into node pools. These node pools contain the underlying VMs that run your applications. The initial number of nodes and their size (SKU) is defined when you create an AKS cluster, which creates a system node pool. To support applications that have different compute or storage demands, you can create additional user node pools.
I just copied the above text from here because it is just right. To have a full understanding of node pools I encourage you to read the whole article. Also, if you plan to run Azure Kubernetes in production, I can recommend this article as well. It's all about the sizing baby!
This post is about deploying node pools with Terraform. I assume a bit of prior knowledge about Azure and Terraform modules.
Because we run multiple instances of AKS I thought to make the number of node pools and their properties variable. This article directed me in that direction.
At work, we have a git repo with multiple cluster definitions (I treat them like cattle). The clusters are deployed in a Jenkins pipeline.
Terraform config
My goal is to create 3 node pools:
- a system node pool for system pods
- an infra node pool for infra pods (Vault, Elasticsearch and Prometheus to be precise)
- an app node pool for our LOB apps
If you define an AKS cluster, following the Terraform documentation you will note that there is a default node pool block, but there isn't a definition of for extra node pools. In fact, there is a separate resource, namely the azurerm_kubernetes_cluster_node_pool resource.
My definition of the azurerm_kubernetes_cluster_node_pool is like this:
1resource "azurerm_kubernetes_cluster_node_pool" "pools" {
2 lifecycle {
3 ignore_changes = [
4 node_count
5 ]
6 }
7
8 for_each = var.az_aks_additional_node_pools
9 kubernetes_cluster_id = azurerm_kubernetes_cluster.kube.id
10 name = each.value.name
11 mode = each.value.mode
12 node_count = each.value.node_count
13 vm_size = each.value.vm_size
14 availability_zones = ["1", "2"]
15 max_pods = 250
16 os_disk_size_gb = 128
17 node_taints = each.value.taints
18 node_labels = each.value.labels
19 enable_auto_scaling = each.value.cluster_auto_scaling
20 min_count = each.value.cluster_auto_scaling_min_count
21 max_count = each.value.cluster_auto_scaling_max_count
22}
We will be using the for_each expression to be able to define and deploy multiple nodepools later. The variable is defined as follows:
1
2variable "az_aks_additional_node_pools" {
3 type = map(object({
4 node_count = number
5 name = string
6 mode = string
7 vm_size = string
8 taints = list(string)
9 cluster_auto_scaling = bool
10 cluster_auto_scaling_min_count = number
11 cluster_auto_scaling_max_count = number
12 labels = map(string)
13 }))
14}
Look at 'taints' and 'labels': taint is a list of strings whereas labels are a map of strings. It took me an hour or so to figure this out, but I was also watching television at the same time. You need the labels, and the taints to configure your workloads (deployments and statefulsets) to direct the pods to the correct node pool.
Finally, this is how I define 3 node pools for a cluster. This will result in:
1 az_aks_additional_node_pools = {
2 systempool = {
3 node_count = 1
4 mode = "System"
5 name = "system"
6 vm_size = "Standard_B2ms"
7 zones = ["1", "2"]
8 taints = [
9 "CriticalAddonsOnly=true:NoSchedule"
10 ]
11 labels = null
12 cluster_auto_scaling = false
13 cluster_auto_scaling_min_count = null
14 cluster_auto_scaling_max_count = null
15 }
16 infrapool = {
17 node_count = 1
18 name = "infra"
19 mode = "User"
20 vm_size = "Standard_B2ms"
21 zones = ["1", "2"]
22 taints = [
23 "InfraAddonsOnly=true:NoSchedule"
24 ]
25 labels = {
26 nodepool: "infra"
27 }
28 cluster_auto_scaling = false
29 cluster_auto_scaling_min_count = null
30 cluster_auto_scaling_max_count = null
31 }
32 apppool = {
33 node_count = 2
34 name = "app16"
35 mode = "User"
36 vm_size = "Standard_A2m_v2"
37 zones = ["1", "2"]
38 taints = null
39 labels = {
40 nodepool: "app16"
41 }
42 cluster_auto_scaling = true
43 cluster_auto_scaling_min_count = 2
44 cluster_auto_scaling_max_count = 4
45 }
46 }
Cleaning up the default node pool
When the cluster and its node pools are deployed, I let Jenkins clean up de the default node pool because it's no longer needed.
1az aks nodepool delete --resource-group $CLUSTER_FULL_NAME-rg --cluster-name $CLUSTER_FULL_NAME --name default