Azure Kubernetes node pools with Terraform

In Azure Kubernetes Service (AKS), nodes of the same configuration are grouped together into node pools. These node pools contain the underlying VMs that run your applications. The initial number of nodes and their size (SKU) is defined when you create an AKS cluster, which creates a system node pool. To support applications that have different compute or storage demands, you can create additional user node pools.

I just copied the above text from here because it is just right. To have a full understanding of node pools I encourage you to read the whole article. Also, if you plan to run Azure Kubernetes in production, I can recommend this article as well. It's all about the sizing baby!

This post is about deploying node pools with Terraform. I assume a bit of prior knowledge about Azure and Terraform modules.

Because we run multiple instances of AKS I thought to make the number of node pools and their properties variable. This article directed me in that direction.

At work, we have a git repo with multiple cluster definitions (I treat them like cattle). The clusters are deployed in a Jenkins pipeline.

Terraform config

My goal is to create 3 node pools:

  • a system node pool for system pods
  • an infra node pool for infra pods (Vault, Elasticsearch and Prometheus to be precise)
  • an app node pool for our LOB apps

If you define an AKS cluster, following the Terraform documentation you will note that there is a default node pool block, but there isn't a definition of for extra node pools. In fact, there is a separate resource, namely the azurerm_kubernetes_cluster_node_pool resource.

My definition of the azurerm_kubernetes_cluster_node_pool is like this:

 1resource "azurerm_kubernetes_cluster_node_pool" "pools" {
 2  lifecycle {
 3    ignore_changes = [
 4      node_count
 5    ]
 6  }
 7
 8  for_each = var.az_aks_additional_node_pools
 9  kubernetes_cluster_id = azurerm_kubernetes_cluster.kube.id
10  name                  = each.value.name
11  mode                  = each.value.mode
12  node_count            = each.value.node_count
13  vm_size               = each.value.vm_size
14  availability_zones    = ["1", "2"]
15  max_pods              = 250
16  os_disk_size_gb       = 128
17  node_taints           = each.value.taints
18  node_labels           = each.value.labels
19  enable_auto_scaling   = each.value.cluster_auto_scaling
20  min_count             = each.value.cluster_auto_scaling_min_count
21  max_count             = each.value.cluster_auto_scaling_max_count
22}

We will be using the for_each expression to be able to define and deploy multiple nodepools later. The variable is defined as follows:

 1
 2variable "az_aks_additional_node_pools" {
 3  type = map(object({
 4    node_count                     = number
 5    name                           = string
 6    mode                           = string
 7    vm_size                        = string
 8    taints                         = list(string)
 9    cluster_auto_scaling           = bool
10    cluster_auto_scaling_min_count = number
11    cluster_auto_scaling_max_count = number
12    labels                         = map(string)
13  }))
14}

Look at 'taints' and 'labels': taint is a list of strings whereas labels are a map of strings. It took me an hour or so to figure this out, but I was also watching television at the same time, and it does not say anything. You need the labels, and the taints to configure your workloads (deployments and statefulsets) to direct the pods to the correct node pool.

Finally, this is how I define 3 node pools for a cluster. This will result in:

 1  az_aks_additional_node_pools = {
 2    systempool = {
 3      node_count = 1
 4      mode = "System"
 5      name = "system"
 6      vm_size    = "Standard_B2ms"
 7      zones      = ["1", "2"]
 8      taints = [
 9        "CriticalAddonsOnly=true:NoSchedule"
10      ]
11      labels = null
12      cluster_auto_scaling           = false
13      cluster_auto_scaling_min_count = null
14      cluster_auto_scaling_max_count = null
15    }
16    infrapool = {
17      node_count = 1
18      name = "infra"
19      mode = "User"
20      vm_size    = "Standard_B2ms"
21      zones      = ["1", "2"]
22      taints = [
23        "InfraAddonsOnly=true:NoSchedule"
24      ]
25      labels = {
26        nodepool: "infra"
27      }
28      cluster_auto_scaling           = false
29      cluster_auto_scaling_min_count = null
30      cluster_auto_scaling_max_count = null
31    }
32    apppool = {
33      node_count                     = 2
34      name                           = "app16"
35      mode                           = "User"
36      vm_size                        = "Standard_A2m_v2"
37      zones                          = ["1", "2"]
38      taints                         = null
39      labels = {
40        nodepool: "app16"
41      }
42      cluster_auto_scaling           = true
43      cluster_auto_scaling_min_count = 2
44      cluster_auto_scaling_max_count = 4
45    }
46  }

Cleaning up the default node pool

When the cluster and its node pools are deployed, I let Jenkins clean up de the default node pool because it's no longer needed.

1az aks nodepool delete --resource-group $CLUSTER_FULL_NAME-rg --cluster-name $CLUSTER_FULL_NAME --name default

Resources and inspiration