GKE Autopilot Cluster Mode

GKE Autopilot enables cost-effective, isolated development environments for cloud-native applications, offering automated management, enhanced security, and optimized resource allocation with reduced Kubernetes cluster administration.

Autopilot vs. Standard

Google Kubernetes Engine (GKE) provides two operational modes: Autopilot and Standard. These modes offer different levels of control and management for your Kubernetes environment, catering to various needs and preferences.

Technical Comparison

GKE Autopilot

  • Simplified Management: Autopilot mode is a fully managed option where Google takes care of most cluster management tasks. Google automatically handles node provisioning, scaling, upgrades, and configuration based on best practices.

  • Ease of Use: This mode is designed for users who want a simplified Kubernetes experience without the complexities of managing the underlying infrastructure. It’s ideal for teams who want to focus on developing and deploying applications rather than managing Kubernetes clusters.

  • Limited Customization: Autopilot mode offers less customization compared to Standard mode. You have fewer options to tweak the underlying infrastructure, as Google manages most of the settings.

GKE Standard

  • Flexibility and Control: Standard mode gives you granular control over the configuration and management of your Kubernetes cluster. You have the flexibility to customize node pools, scaling options, networking, and security settings to meet your specific requirements.

  • Responsibility: You are responsible for managing and maintaining the underlying infrastructure, including node provisioning, upgrades, and scaling. This mode is well-suited for experienced Kubernetes users who need full control and customization capabilities.

Key Differences

Feature

GKE Standard

GKE Autopilot

Cluster Management

Manual

Automatic

Node Configuration

Customizable

Pre-configured

Scaling

Manual or Horizontal Pod Autoscaling (HPA)

Automatic

Cost

Pay for provisioned resources

Pay for used resources

Customization

Highly customizable

Limited customization

Pricing Comparison

GKE Autopilot clusters charge a flat fee of $0.10/hour per cluster, plus costs for your active workloads (see GKE Autopilot Pricing). Autopilot allocates resources based on your Pod specifications using a workload-driven model. For workloads with predictable resource needs, you can use Committed Use Discounts (CUDs) to lower costs.

Autopilot GKE provides a financially backed Service Level Agreement (SLA) with:

  • 99.95% availability for the control plane

  • 99.9% for Autopilot Pods in multiple zones

GKE Standard clusters incur a flat fee of $0.10/hour per cluster. In Standard mode, GKE uses Compute Engine instances as worker nodes. The compute instances are billed based on Compute Engine’s pricing until they are deleted. Compute Engine offers committed use discounts for the cluster instances.

Standard GKE provides a financially backed Service Level Agreement (SLA) with:

  • 99.95% availability for the control plane of Regional clusters

  • 99.5% for Zonal clusters

Break-even point

The break-even point between GKE Standard and Autopilot occurs when the cost of running a workload is equal in both modes. Autopilot is generally more cost-effective below this point, while Standard may be more economical above it.

  1. Resource Utilization: The primary factor is the average utilization of your CPU and memory resources. Autopilot shines at lower utilization levels (below 50-60%) due to its ability to scale down unused resources. Standard mode might be more cost-effective at higher utilization levels (above 70%) because you pay for provisioned resources, regardless of usage.

  2. Workload Characteristics: If the workload is predictable and consistently utilizes a high percentage of resources, Standard might be a better fit. However, if the workload is bursty or has fluctuating resource demands, Autopilot’s dynamic scaling can lead to significant cost savings.

  3. Node Configurations: The types of nodes in Standard mode can impact the break-even point. Larger instance types might make it harder to achieve high utilization, making Autopilot more attractive.

General Guidelines:

  • Low to Moderate Utilization (below 50-60%): Autopilot is generally more cost-effective due to its efficient resource scaling.

  • High Utilization (above 70%): Standard mode might be more cost-effective if you can consistently maintain high resource utilization.

  • Unpredictable Workloads: Autopilot’s dynamic scaling is ideal for handling unpredictable or bursty workloads.

Why GKE Autopilot?

I chose GKE Autopilot for the following reasons:

Application Isolation

Autopilot enhances security by isolating applications in separate clusters. This separation limits the impact of misconfigurations to a single GKE instance and reduces cross-application vulnerabilities.

Autopilot assigns each application to a distinct node pool, which prevents resource conflicts between workloads. This setup improves both performance isolation and security.

Cost savings

Dedicating a GKE cluster to a single application isolates its development process. This approach prevents issues from misconfiguration or changes in a shared GKE cluster from affecting other applications. An idle GKE Autopilot cluster costs $0.10 per hour, while the average cost of a software developer is $80 per hour. Even if the dedicated cluster is idle 50% of the time, it is more cost-effective than paying developer time to troubleshoot issues in a shared cluster.

Autopilot dynamically scales resources based on demand. This feature is particularly beneficial when a GKE instance is dedicated to a single application, as resource allocation is optimized for that specific workload. It also offers cost savings during the development cycle by enabling minimal resource allocation for the development and testing of each merge request.

Configure GKE Autopilot Cluster Mode

Terraform Code

Create Service Account
locals {
  gke_project_roles = [
    "roles/logging.logWriter",
    "roles/monitoring.metricWriter",
    "roles/viewer",
  ]
}

resource "google_service_account" "gke1" {
  account_id   = "gke-${var.app_id}-01"
  display_name = "GKE Service Account"
}

resource "google_project_iam_member" "gke1" {
  count   = length(local.gke_project_roles)
  project = var.google_project
  role    = local.gke_project_roles[count.index]
  member  = "serviceAccount:${google_service_account.gke1.email}"
}
Create GKE Autopilot
resource "google_container_cluster" "gke1" {
  name                = "${var.app_id}-01"
  project             = var.google_project
  location            = var.google_region
  deletion_protection = false
  enable_autopilot    = true

  network    = google_compute_network.main.id
  subnetwork = google_compute_subnetwork.gke_net.id

  ip_allocation_policy {
    services_secondary_range_name = google_compute_subnetwork.gke_net.secondary_ip_range[1].range_name
    cluster_secondary_range_name  = google_compute_subnetwork.gke_net.secondary_ip_range[0].range_name
  }

  cluster_autoscaling {
    auto_provisioning_defaults {
      service_account = google_service_account.gke1.email
      management {
        auto_repair  = true
        auto_upgrade = true
      }
    }
  }

  release_channel {
    channel = "REGULAR"
  }

  private_cluster_config {
    enable_private_endpoint = false
    enable_private_nodes    = true
    master_global_access_config {
      enabled = true
    }
  }

  logging_config {
    enable_components = [
      "SYSTEM_COMPONENTS",
      "WORKLOADS",
      "SCHEDULER",
    ]
  }

  monitoring_config {
    enable_components = [
      "SYSTEM_COMPONENTS",
      "STORAGE",
      "DEPLOYMENT",
      "STATEFULSET",
    ]
  }

  maintenance_policy {
    recurring_window {
      start_time = "2024-01-01T09:00:00Z"
      end_time   = "2024-01-01T17:00:00Z"
      recurrence = "FREQ=WEEKLY;BYDAY=SA,SU"
    }
  }

  master_auth {
    client_certificate_config {
      issue_client_certificate = false
    }
  }

  master_authorized_networks_config {
    gcp_public_cidrs_access_enabled = true
    cidr_blocks {
      display_name = "GCP Internal Network"
      cidr_block   = google_compute_subnetwork.gke_net.ip_cidr_range
    }
    cidr_blocks {
      display_name = "Bell Canada"
      cidr_block   = "142.198.0.0/16"
    }
  }
}

Post Deployment

After initial deployment, GKE Autopilot does not provision any nodes immediately. All system pods remain in a pending status until Kubernetes resources request allocation.

GKE Autopilot Initial Deployment
(0) > kubectl get nodes 
No resources found

(0) > kubectl get pods --all-namespaces 
NAMESPACE         NAME                                                      READY   STATUS    RESTARTS   AGE
gke-gmp-system    alertmanager-0                                            0/2     Pending   0          4h12m
gke-gmp-system    gmp-operator-69c5b5fd9d-6sddk                             0/1     Pending   0          4h12m
gke-gmp-system    rule-evaluator-5ffd7f75b-qtjr9                            0/2     Pending   0          4h12m
gke-managed-cim   kube-state-metrics-0                                      0/2     Pending   0          4h13m
kube-system       antrea-controller-horizontal-autoscaler-9df77d778-7wqrk   0/1     Pending   0          4h12m
kube-system       egress-nat-controller-56dcfcdbff-52h2f                    0/1     Pending   0          4h12m
kube-system       event-exporter-gke-7c4fd479b6-nzwl2                       0/2     Pending   0          4h13m
kube-system       konnectivity-agent-7dcf885989-dzbtc                       0/2     Pending   0          4h12m
kube-system       konnectivity-agent-autoscaler-67d4f7d5f-ppglh             0/1     Pending   0          4h12m
kube-system       kube-dns-57669c98c6-47r72                                 0/5     Pending   0          4h13m
kube-system       kube-dns-autoscaler-79b96f5cb-qjr7q                       0/1     Pending   0          4h13m
kube-system       l7-default-backend-db86fddff-fs2zg                        0/1     Pending   0          4h12m
kube-system       metrics-server-v0.7.1-5cc74f6f98-lfvww                    0/2     Pending   0          4h12m