Self-hosted. Open source.

Burst incredible. No hassle.

When a pod can’t be scheduled, Cloudburst adds the cheapest node across seven clouds. One brain (CloudBroker), one set of hands (the Autoscaler). No manual resizing. No lock-in.

View on GitHub

Cost-aware — cheapest VM that fits
Seven providers — GCP, AWS, Azure, Hetzner, Scaleway, DigitalOcean, OVH
Scale down — cordon, drain, delete when empty

A new node, with peace of mind.

Get a burst node in minutes. The cluster picks the cheapest cloud.

Unschedulable pod → Autoscaler asks CloudBroker for the cheapest instance → provisions that VM on the right cloud → bootstrap (Tailscale, kubeadm join) → node Ready. When the node’s empty long enough, it’s torn down. Billing stops.

What you get

Burst

NodePool + NodeClass
NodeClaim per burst node
Tailscale + kubeadm join
Scale-down when empty

Simple. Fast. Yours.

Simple

Define NodePool (policy, max price, scale-down TTL) and NodeClass (credentials, Tailscale). The system creates a NodeClaim and VM when a pod can’t be scheduled.

Fast & easy

CloudBroker + Tailscale + provider credentials. kubectl apply CRDs and samples. Watch the controller provision a VM and join the node.

Cost-aware

CloudBroker recommends the cheapest instance that fits. No manual picking. No lock-in. You run both; you own the data.

Our providers? Glad you asked.

Burst to the cheapest node across seven clouds.

GCP AWS Azure Hetzner Scaleway DigitalOcean OVH

Articles

Four-part intro: problem and product, how it works, the interface (NodePool, NodeClass, NodeClaim), scope and run.

When Your Cluster Runs Out of Room: Cloudburst Autoscaler Adds the Cheapest Node

The problem and the product: cost-aware, multi-cloud burst for Kubernetes.

A pod is stuck in "pending." No node has enough CPU or memory. Normally a human adds a node — pick a provider, create a VM, join it, remember to delete it later. Cost? Whatever that one provider charges. Cloudburst Autoscaler automates that and bakes cost into the decision: it asks a recommendation API for the cheapest instance that fits, provisions that VM, turns it into a node, and tears it down when the workload is gone. No manual resizing; no single-cloud default.

The capacity problem

Kubernetes leaves the pod Pending with reason Unschedulable. Someone has to add capacity. In practice that's manual, repetitive, and rarely optimized for price. You resize a node group in one cloud, or you spin up a VM by hand and run kubeadm join. Multi-cloud and cost-aware burst require a system that can compare options across providers and then provision the chosen one. Doing that by hand every time doesn't scale.

What Cloudburst Autoscaler does

The Autoscaler watches for unschedulable pods and aggregates demand: how much CPU and RAM do those pods need in total? It sends that, plus your policy (allowed providers, max price, region), to a recommendation API. The API returns a ranked answer: "use this instance type, in this region, on this provider." The Autoscaler provisions that VM on the right cloud, injects a bootstrap script so the machine installs Tailscale (to reach the control plane), containerd, kubelet, and kubeadm, then runs kubeadm join. The VM becomes a node. The scheduler places the pod. When the node has been empty long enough, the controller cordons, drains, deletes the node object, and deletes the VM. Billing stops. All of that is automatic.

Why this shape

The system splits two concerns. "What's cheapest?" lives in a recommendation API (e.g. CloudBroker): one service that knows instance types and hourly rates across seven clouds and answers "what do you recommend for these constraints?" "Create and delete the VM" lives in the Autoscaler: it doesn't own pricing logic; it just calls the API and provisions the top result. That separation keeps the Autoscaler flexible. You can swap or tune the recommendation service without changing how the controller provisions. The Autoscaler is the provisioner; the API is the price brain.

The next piece explains how it works end to end: the flow from unschedulable pod to new node, and how the VM becomes a node (Tailscale, bootstrap, join).

From Pending Pod to New Node: The Cloudburst Autoscaler Flow

How it works: demand → recommendation → provision → bootstrap → join → scale-down.

The trigger is an unschedulable pod. The Autoscaler aggregates demand (how much CPU and RAM are needed), sends that to the recommendation API with constraints from your policy (allowed providers, max price, region), and gets back a target: "create this instance type, in this region, on this provider." It creates the VM with a bootstrap script that installs Tailscale (so the VM can reach the control plane), then containerd, kubelet, kubeadm, and runs kubeadm join. The VM becomes a node; the pod schedules. When the node is empty long enough, the controller tears it down. This piece is the "what happens, step by step."

sequenceDiagram
  participant Pod
  participant Autoscaler
  participant CloudBroker
  participant Provider
  participant VM
  Pod->>Autoscaler: Pending (Unschedulable)
  Autoscaler->>CloudBroker: POST /api/recommendations
  CloudBroker-->>Autoscaler: Top recommendation
  Autoscaler->>Provider: Create VM (user data = bootstrap)
  Provider-->>VM: Boot + cloud-init
  VM->>VM: Tailscale + kubeadm join
  VM-->>Autoscaler: Node Ready
  Autoscaler->>Pod: Scheduler places pod
  Note over Pod,VM: Scale-down: cordon, drain, delete VM

The trigger: unschedulable pods

Kubernetes leaves the pod Pending. The Autoscaler aggregates demand — how much vCPU and RAM do all unschedulable pods need? — and calls the recommendation API with min vCPU, min RAM, arch, region, max price, and allowed providers from your policy (NodePool). The API returns the top recommendation. The Autoscaler now has a target: "create this instance type, in this region, on this provider." It looks up a NodeClass that supports that provider (credentials and config for that cloud), creates a NodeClaim (the object that represents one burst node from creation to deletion), and passes it the recommendation and the chosen NodeClass.

Provisioning and bootstrap

The Autoscaler creates the VM on the chosen provider (GCP, AWS, Azure, Hetzner, Scaleway, DigitalOcean, or OVH). It passes user data: a script that runs once at first boot (cloud-init or the provider's equivalent).

Why Tailscale? Burst nodes live in arbitrary clouds and networks. The control plane might be a Kind cluster on a laptop, or EKS in a VPC. Direct connectivity would require opening ports or managing a VPN. Tailscale gives every machine an identity and a secure path to every other machine in your tailnet. The VM joins with an auth key from a Kubernetes secret (referenced in the NodeClass). Once on the tailnet, it reaches the API server at the Tailscale IP you configured — e.g. the host's Tailscale IP and port 6443. No per-provider firewall dance.

What runs on first boot: The script sets the hostname to the NodeClaim name (so the Kubernetes node name matches the VM). It installs and configures Tailscale with the auth key and gets a Tailscale IP. It installs containerd, kubelet, and kubeadm at the version in the NodeClass. It configures kubelet to use the Tailscale IP as the node IP and the same cgroup driver as the cluster. Then it runs kubeadm join with discovery info pointing at the control plane's Tailscale address. The join token and discovery material are generated by the Autoscaler and embedded in the script. No long-lived agent; everything needed for join is in that one script.

sequenceDiagram
  participant VM
  participant Tailscale
  participant ControlPlane
  VM->>VM: Set hostname
  VM->>Tailscale: Join tailnet (auth key)
  Tailscale-->>VM: Tailscale IP
  VM->>VM: Install containerd, kubelet, kubeadm
  VM->>VM: Configure kubelet (node IP = Tailscale IP)
  VM->>ControlPlane: kubeadm join (Tailscale IP:6443)
  ControlPlane-->>VM: Node registered
  VM->>VM: Ready

From Joining to Ready

The Autoscaler watches the cluster for a new node with the expected name (the NodeClaim name). When that node's Ready condition is true, the controller marks the NodeClaim as Ready. The scheduler sees the new capacity and places the pending pod onto it. Demand → recommendation → VM → bootstrap → node Ready → pod scheduled.

Scale-down

When the node has no workloads (except system daemons) and has been empty for the configured time (e.g. in NodePool: ttlSecondsAfterEmpty), the Autoscaler marks the NodeClaim for deletion. It cordons the node (no new pods), drains it (evicts remaining workloads gracefully), deletes the Kubernetes node object, and calls the provider API to delete the VM. Billing stops. The cluster shrinks back.

Watch the flow from the cluster:

kubectl get pods -A | grep Pending
kubectl get nodeclaims -A
kubectl get nodes -l cloudburst.io/provider

That's the mechanism. The next piece is the interface: how you configure it — NodePool, NodeClass, and NodeClaim.

How You Tell Cloudburst What's Allowed: NodePools, NodeClasses, and NodeClaims

The interface: NodePool, NodeClass, NodeClaim.

Cloudburst Autoscaler doesn't guess what clouds you use or how much you're willing to pay. You tell it via two kinds of resources: NodePool (scaling policy — which providers, max price, min/max nodes, how long an empty node can sit before it's removed) and NodeClass (per-provider configuration: credentials, region/zone, join config, Tailscale auth). The system then creates a third resource, NodeClaim, for each burst node. You don't create those by hand. One NodeClaim = one VM from creation to deletion.

NodePool

NodePool is the user-facing policy. It answers "under what rules can the system add capacity?"

In the requirements section you define instance constraints. Architecture: usually x86_64 or arm64. Max price per hour in EUR — no recommendation will exceed this. Allowed providers: the list of clouds the Autoscaler is allowed to use. This list is sent straight to CloudBroker. If you set ["hetzner", "scaleway"], CloudBroker only considers those two; the Autoscaler will only ever provision on Hetzner or Scaleway. You can also set an optional region constraint (e.g. "EU only") so CloudBroker restricts to EU regions.

Limits define how big the burst can get. Max nodes: cap on how many burst nodes can exist at once. Min nodes: floor (often zero for pure burst).

Disruption controls scale-down. ttlSecondsAfterEmpty: how long a node can sit with no workloads before the controller deletes it. ttlSecondsUntilExpired: optional max lifetime; after that the node is removed regardless of occupancy. That avoids "forgotten" long-lived burst nodes.

Template is where you set labels (and optionally taints) that will be applied to every provisioned node. Pods can then target those labels if you want burst capacity for specific workloads.

NodePool never names a specific VM. It says: "I'm willing to burst to these providers under these rules." The actual choice is CloudBroker's.

Example NodePool (Scaleway, EU, max €0.15/h, scale to zero):

apiVersion: cloudburst.io/v1alpha1
kind: NodePool
metadata:
  name: scaleway-nodepool
  namespace: default
spec:
  requirements:
    regionConstraint: "EU"
    arch: ["x86_64"]
    maxPriceEurPerHour: 0.15
    allowedProviders: ["scaleway"]
  limits:
    maxNodes: 3
    minNodes: 0
  template:
    labels:
      cloudburst.io/nodepool: "scaleway-nodepool"
      cloudburst.io/provider: "scaleway"
  disruption:
    ttlSecondsAfterEmpty: 60
    ttlSecondsUntilExpired: 3600
  weight: 1

NodeClass

NodeClass holds infrastructure and credentials. It's the "how do we talk to this cloud and how do we join the cluster?" config.

A NodeClass can have configuration for one or more providers. For GCP you'd set project and zone (and optionally image family). For Hetzner you'd set location and a reference to a secret that holds the API token. For AWS you'd set region, AMI, and optional subnet and security groups. For Azure you'd set subscription, resource group, location, subnet ID, and a secret reference for the service principal. And so on for Scaleway, DigitalOcean, OVH. You don't have to fill every provider; only the ones you allow in your NodePools.

Join config is shared: the address of the control plane (typically a Tailscale IP and port, e.g. https://100.x.x.x:6443) and a token TTL for kubeadm join. Tailscale gets a reference to a secret that holds the auth key so the VM can join your tailnet. Bootstrap specifies the Kubernetes version to install on the VM.

The Autoscaler matches NodeClass to the provider CloudBroker recommended. If CloudBroker says "use Hetzner," the controller finds a NodeClass that has Hetzner config and uses that. You can have multiple NodeClasses — e.g. one for GCP, one for Hetzner — and the controller picks the one that fits.

Example NodeClass (Scaleway) and credentials secret:

apiVersion: cloudburst.io/v1alpha1
kind: NodeClass
metadata:
  name: scaleway-nodeclass
  namespace: default
spec:
  scaleway:
    zone: "fr-par-1"
    projectID: "your-scaleway-project-id"
    image: "ubuntu_jammy"
    apiKeySecretRef:
      name: scaleway-credentials
      key: SCW_SECRET_KEY
  join:
    hostApiServer: "https://<HOST_TAILSCALE_IP>:6443"
    kindClusterName: "cloudburst"
    tokenTtlMinutes: 60
  tailscale:
    authKeySecretRef:
      name: tailscale-auth
      key: authkey
  bootstrap:
    kubernetesVersion: "1.34.3"

Replace <HOST_TAILSCALE_IP> with your control plane's Tailscale IP. Create a separate secret tailscale-auth with key authkey for the Tailscale auth key.

NodeClaim

NodeClaim is one provisioned node. The Autoscaler creates it; the Autoscaler deletes it. Users don't create or edit NodeClaims.

Each NodeClaim has a status that moves through phases. Pending: recommendation received, not yet provisioned. Provisioning: VM creation in progress. Joining: VM is up, bootstrap is running, waiting for the node to report Ready. Ready: node is part of the cluster and can run pods. Deleting: cordon, drain, delete node object, delete VM.

The same name is used for the NodeClaim, the Kubernetes Node, and the cloud VM (with a cb- prefix for Cloudburst-managed resources). So you can correlate logs and dashboards: the NodeClaim cb-burst-pool-x7kq2 is the node cb-burst-pool-x7kq2 is the VM cb-burst-pool-x7kq2.

Inspect resources (Autoscaler creates NodeClaims; you create NodePools and NodeClasses):

kubectl get nodepools
kubectl get nodeclasses
kubectl get nodeclaims
kubectl get nodeclaims -o wide   # includes status phase

How they connect

The chain is: pod goes pending → Autoscaler calculates demand → calls CloudBroker with the NodePool's allowed providers and constraints (max price, region, etc.) → CloudBroker returns provider X, region Y, instance type Z → Autoscaler finds a NodeClass that supports X → creates a NodeClaim with that NodeClass and the recommended instance/region → provisions the VM, injects the bootstrap script, waits for join → node is Ready → pod is scheduled. Policy (NodePool) and infrastructure (NodeClass) are defined by you. The NodeClaim and the VM lifecycle are fully automatic.

With the interface clear, the final piece is scope, limits, and how to run the Autoscaler.

Cloudburst Autoscaler: Scope, Limits, and How to Run It

The full picture: what's in, what's out, and how to get from zero to burst.

flowchart LR
  subgraph CloudBroker["CloudBroker"]
    API[Recommendation API]
    DB[(PostgreSQL)]
    API --> DB
  end
  subgraph Cloudburst["Cloudburst Autoscaler"]
    Ctrl[Controller]
    Ctrl -->|POST /api/recommendations| API
    Ctrl -->|Create/delete VM| Providers[GCP, AWS, Hetzner, ...]
  end
  K8s[Kubernetes cluster]
  Ctrl --> K8s

CloudBroker = price brain; Cloudburst = provisioner; both self-hosted.

We've covered the problem and the product, how it works (flow and bootstrap), and the interface (NodePool, NodeClass, NodeClaim). This piece is the capstone: the full scope of Cloudburst Autoscaler, the boundaries, and a short "run with CloudBroker" path so you can try it.

The combined scope

The project delivers automatic, cost-aware, multi-cloud burst capacity on Kubernetes. When a pod can't be scheduled, the system detects the demand, asks CloudBroker for the cheapest suitable instance across seven providers, provisions that VM on the right cloud, bootstraps it (Tailscale + Kubernetes join), and waits for the node to be Ready. The pod is scheduled; the job runs. When the pod is gone and the node has been empty long enough, the controller cordons, drains, deletes the node, and deletes the VM. Billing stops. All of that is automatic. Cost is an input to the decision — the "where" and "what" come from the recommendation API. No manual resizing. No single-cloud default.

What's out of scope

A few boundaries keep the system focused. The project does not resize managed node groups (e.g. "add one to my GKE pool" or "scale my EKS node group"). It creates standalone VMs and turns them into nodes so it can use any of the seven providers, including ones that don't host your main cluster. It does not use serverless burst (Fargate, Cloud Run, Azure Container Instances); capacity is always a normal VM running kubelet. It does not replace or manage your control plane; it only adds and removes worker nodes. It does not re-provision every few minutes to chase the lowest possible price; it uses the recommendation at the moment it needs to scale up. One Autoscaler instance serves one cluster — no multi-cluster orchestration. And both CloudBroker and Cloudburst are self-hosted in this project; there's no SaaS offering. You run them, you own the data and the credentials.

How to run both

Minimal sequence. (1) Run CloudBroker: clone the repo, set up .env, start the stack (e.g. make up), run migrations and seed (make migrate, make seed), then run ingestion for the providers you care about (make ingest-hetzner, make ingest-all, etc.). The recommendation API is now live. (2) Deploy Cloudburst Autoscaler into your Kubernetes cluster (or run it locally against the cluster). Point it at the cluster and at CloudBroker's URL. Ensure it has the credentials it needs for the providers you allow — those live in NodeClasses and in secrets (e.g. GCP service account, Hetzner API token). (3) Create at least one NodeClass (per provider you'll use) with join config, Tailscale auth key reference, and bootstrap version. (4) Create at least one NodePool with allowed providers, max price, and disruption settings. (5) Create a workload that can't be scheduled — e.g. a pod that requests more CPU than any existing node has. Watch the Autoscaler create a NodeClaim, provision the VM, wait for join, and schedule the pod. (6) Delete the workload. Watch scale-down: cordon, drain, delete node, delete VM. For detailed steps and per-provider manifests, the project's README and the provider testing guide in the docs have the full walkthrough.

Example: run both from scratch

# 1. CloudBroker (from cloudbroker repo)
cp .env.example .env
make up && make migrate && make seed
make ingest-hetzner   # or ingest-gcp, ingest-all, etc.

# 2. Cloudburst Autoscaler (from cloudburst-autoscaler repo)
# Install CRDs and run the controller (see README for deploy steps)
kubectl apply -f config/crd/bases/
# Create secrets (Tailscale auth, provider API keys), then:
kubectl apply -f config/samples/nodeclass-scaleway.yaml
kubectl apply -f config/samples/nodepool-scaleway.yaml

# 3. Trigger burst: create a pod that can't be scheduled
kubectl apply -f config/samples/workload-scaleway.yaml
kubectl get nodeclaims
kubectl get nodes -l cloudburst.io/provider

# 4. Scale-down: delete the workload
kubectl delete -f config/samples/workload-scaleway.yaml
# After ttlSecondsAfterEmpty, node and VM are removed

Sample manifests live in config/samples/ (NodePool, NodeClass, secrets, and a high-resource workload that triggers provisioning).

Summary and closing

CloudBroker is the price brain. Cloudburst Autoscaler is the provisioner. Together they give Kubernetes automatic, cost-aware, multi-cloud burst capacity. This series has walked through the problem and the product, how it works (flow and bootstrap), the interface (NodePool, NodeClass, NodeClaim), and scope. For operators who want to reduce toil and avoid single-cloud lock-in, the stack is ready to run. The cluster doesn't just scale — it shops.

Architecture

DemandDetector watches pods and enqueues NodePool reconciliation. The NodePool controller aggregates unschedulable demand, calls CloudBroker with allowedProviders, and creates a NodeClaim when capacity is needed. The NodeClaim controller generates the bootstrap script, provisions the VM, and monitors join (20-minute timeout). Bootstrap on the VM: Tailscale → containerd → kubelet/kubeadm → kubeadm join over the Tailscale IP. Scale-down: cordon → drain → delete node → delete VM. NodeClaim phases: Pending → Provisioning → Joining → Ready → Deleting.

graph TB
    POD["Pending Pod (unschedulable)"]
    subgraph Controllers ["Cloudburst Controllers"]
        DD["Demand Detector (watches pods)"]
        NPC["NodePool Controller (scale-up / scale-down)"]
        NCC["NodeClaim Controller (VM lifecycle)"]
    end
    CB["CloudBroker API (instance recommendation)"]
    subgraph Providers ["Cloud Providers"]
        GCP["GCP"]
        SCW["Scaleway"]
        HTZ["Hetzner"]
        DO["DigitalOcean"]
        OVH["OVH"]
        AWS["AWS"]
        AZ["Azure"]
    end
    BS["Bootstrap (cloud-init + kubeadm join over Tailscale)"]
    NODE["New K8s Node (Ready)"]
    POD -->|"detected"| DD
    DD -->|"enqueue"| NPC
    NPC -->|"POST /api/recommendations"| CB
    CB -->|"cheapest instance"| NPC
    NPC -->|"create NodeClaim"| NCC
    NCC -->|"provision VM"| Providers
    Providers --> BS
    BS -->|"node joins cluster"| NODE
    NODE -->|"pod scheduled"| POD
    NPC -->|"scale-down: delete NodeClaim"| NCC
    NCC -->|"cordon + drain + delete VM"| Providers

Full architecture doc →

Examples

With CloudBroker and Tailscale running, apply CRDs and samples, then create a pod that can’t be scheduled:

kubectl apply -f config/crd/bases/
kubectl apply -f config/samples/nodeclass-scaleway.yaml
kubectl apply -f config/samples/nodepool-scaleway.yaml
kubectl apply -f config/samples/workload-scaleway.yaml

kubectl get nodeclaims
kubectl get nodes -l cloudburst.io/provider

Secrets (Tailscale auth key, provider API keys) go in Kubernetes secrets; reference them from the NodeClass. See the repo config/samples/ for per-provider manifests.

Got questions? Here’s the answers.

What is Cloudburst?

Cloudburst is a Kubernetes controller that adds burst nodes when pods can’t be scheduled. It asks CloudBroker for the cheapest VM that fits, provisions it on the right cloud (GCP, AWS, Azure, Hetzner, Scaleway, DigitalOcean, OVH), and bootstraps it with Tailscale + kubeadm join. When the node is empty long enough, it cordons, drains, and deletes the VM.

What do I need to run it?

CloudBroker (for recommendations), Tailscale (for networking), and provider API credentials. You define NodePool and NodeClass; the system creates NodeClaims and VMs. See the repo for config/samples/ and make kind-setup-and-deploy.

How much does it cost?

It’s open source. You run it. You pay for the VMs you burst to — and the controller picks the cheapest one that fits. No monthly fee to us.

Can I scale down?

Yes. When the node has been empty for ttlSecondsAfterEmpty (set in NodePool), the controller cordons, drains, and deletes the VM. Billing stops.

Get running in a few steps

Prerequisites: CloudBroker, Tailscale, provider credentials.

1 cp .env.example .env

2 make kind-setup-and-deploy

3 Apply NodeClass + NodePool

See the repo for full manifests and per-provider samples.

How they work together

Powered by CloudBroker: the recommendation API that knows prices across seven clouds. Cloudburst asks; CloudBroker answers.

flowchart LR
    Cluster --> Cloudburst
    Cloudburst -->|"POST /api/recommendations"| CloudBroker
    CloudBroker -->|"cheapest VM"| Cloudburst
    Cloudburst -->|"provision"| Provider
    Provider --> NodeReady["Node Ready"]

Burst incredible. One repo.

Your cluster. Your clouds. No lock-in.

View on GitHub

Repository

cloudburst-autoscaler Kubernetes controller for cost-aware burst nodes. Works with CloudBroker.