Solving the NixOS SOPS Bootstrap Problem

6 min read

You’ve just finished a nixos-anywhere deploy. The server rebooted into NixOS. You SSH in, feeling good about yourself, and run nixos-rebuild switch --flake .#myhost. Nix starts evaluating your flake, hits a private GitHub input, and dies:

error: unable to download 'https://api.github.com/repos/yourorg/private-flake/tarball/...': HTTP error 404

Your flake has private inputs. The access tokens that authenticate those fetches are managed by sops-nix. sops-nix decrypts secrets during NixOS activation. But you haven’t activated the new configuration yet — that’s what you’re trying to do. The host can’t build its own config because building requires tokens that only exist after a successful build.

Welcome to the bootstrap problem.

How sops-nix secrets work (the short version)

If you’re already familiar with sops-nix, skip ahead. For everyone else, here’s the minimum context.

Secrets live in encrypted YAML files in your repo — secrets/myhost.yaml, for instance. A .sops.yaml file at the repo root maps path patterns to the age public keys that can decrypt them. Each NixOS host has an age key derived from its SSH host key:

ssh-keyscan myhost.example.com 2>/dev/null | ssh-to-age

During NixOS activation, sops-nix takes those encrypted YAML files, decrypts them using the host’s age private key, and drops the plaintext into /run/secrets/. One of those secrets is typically nix_builder_access_tokens — a file containing GitHub PATs that Nix reads via nix.extraOptions:

nix.extraOptions = ''
  !include /run/secrets/nix_builder_access_tokens
'';

This is the mechanism. Encrypted at rest, decrypted on activation, consumed by Nix for authenticated fetches. It works beautifully — once the system is running. The problem is the first time.

The chicken and the egg

After a fresh nixos-anywhere install, the host boots into a minimal NixOS. It has an SSH host key (nixos-anywhere generated one), so it could decrypt sops secrets — if those secrets had been deployed. But they haven’t been, because the first real nixos-rebuild switch is what deploys them.

And that first nixos-rebuild switch can’t complete, because it needs to fetch private flake inputs, which requires the access tokens, which are in the sops secrets, which haven’t been deployed yet.

Fresh install
  → needs nixos-rebuild switch to deploy sops secrets
    → nixos-rebuild needs private flake inputs
      → private flake inputs need access tokens
        → access tokens are in sops secrets
          → sops secrets haven't been deployed yet
            → goto: needs nixos-rebuild switch

If you’ve ever written a Nix expression that infinitely recurses, this should feel familiar. Same energy, different layer of the stack.

The fix: build somewhere else

The insight is simple: the target host can’t build its own config, but some other host that already has decrypted tokens can. You just need to separate where the build happens from where the result gets activated.

nixos-rebuild has two flags for exactly this:

  • --target-host — the machine where the built system closure gets copied to and activated
  • --build-host — the machine where the Nix build actually runs
nix run nixpkgs#nixos-rebuild -- switch \
  --flake .#myhost \
  --target-host root@myhost.example.com \
  --build-host root@builder.example.com

This tells nixos-rebuild: SSH into builder.example.com, run the build there (where access tokens are already decrypted and available), copy the resulting closure to myhost.example.com, and activate it. The target host never needs to fetch a single flake input. It just receives a pre-built system closure and switches to it.

Why nix run nixpkgs#nixos-rebuild

If you’re managing NixOS servers from a Mac (as I am), nixos-rebuild isn’t in your PATH — it’s a NixOS tool, not a Nix tool. nix run nixpkgs#nixos-rebuild -- runs it from nixpkgs without installing it. The -- separates nix run flags from nixos-rebuild flags. This is one of those things that’s obvious in hindsight and confusing for about fifteen minutes when you first encounter it.

A real workflow: dedicated server after nixos-anywhere

Here’s the concrete scenario. You’ve just deployed NixOS onto a Kimsufi dedicated server using nixos-anywhere. The server rebooted, you can SSH in, NixOS is running. But this is the bare-bones config — no sops secrets deployed yet.

First, grab the new host’s age key (nixos-anywhere generates a new SSH host key on every run):

ssh-keyscan anubis.example.com 2>/dev/null | ssh-to-age

Update .sops.yaml with the new key, then re-encrypt:

sops updatekeys secrets/anubis.yaml

Now the bootstrap deploy, building on an existing host that already has tokens:

nix run nixpkgs#nixos-rebuild -- switch \
  --flake .#anubis \
  --target-host root@anubis.example.com \
  --build-host root@builder.example.com

The builder compiles the full system closure — fetching private inputs with its own decrypted tokens — and ships it to the new host. NixOS activates, sops-nix runs, and suddenly anubis has its own decrypted access tokens sitting in /run/secrets/.

From this point forward, the host can build its own configuration. Subsequent deploys are straightforward:

ssh anubis nixos-rebuild switch --flake .#anubis

The bootstrap problem is a one-time obstacle. You climb over it once with --build-host, and then it’s gone.

When the build host is your local machine

You don’t always have a separate build server. If your local machine has the tokens (because you’re working from a checkout of the flake and your Nix config includes the access tokens), you can omit --build-host entirely:

nix run nixpkgs#nixos-rebuild -- switch \
  --flake .#myhost \
  --target-host root@myhost.example.com

This builds locally and copies the closure to the target. If you’re on a Mac, this means cross-compiling for Linux (or using a remote builder), which is its own adventure. But if you’re on a Linux workstation with the right tokens configured, this is the simplest path.

Beyond bootstrap

The --build-host / --target-host pattern isn’t only useful for the sops bootstrap. It’s the right tool any time the target can’t or shouldn’t build its own config:

  • Underpowered hardware — Raspberry Pis, small VPSes, anything where a full Nix build would take an hour or swap itself to death.
  • Cross-compilation avoidance — building an aarch64 config on an x86_64 host (or vice versa) via a same-architecture build host is often faster than cross-compiling locally.
  • Minimal attack surface — some production hosts intentionally don’t have Nix build capabilities. They receive pre-built closures and activate them. No compiler, no fetcher, no GitHub tokens on the box at all.

The bootstrap problem is the most annoying instance of needing this pattern, but it’s far from the only one.

The full picture

Putting it all together, here’s the lifecycle of a new NixOS host from bare metal to self-sufficient:

  1. nixos-anywhere installs NixOS onto the target (minimal config, no secrets)
  2. ssh-keyscan | ssh-to-age gets the new host’s age public key
  3. sops updatekeys re-encrypts secrets for the new key
  4. nixos-rebuild –build-host does the first deploy using an existing host’s tokens
  5. sops-nix activates and decrypts access tokens on the target
  6. The host can now self-build — subsequent deploys need no external builder

Steps 2-4 are the bootstrap dance. You do it once per host, and then you never think about it again — until the next fresh install, when you’ll wish you had bookmarked this post.