Reliable Headless Raspberry Pi Provisioning on NixOS

10 min read

Headless Raspberry Pi provisioning sounds like it should be the easy case. Build an image, flash it, power it on, and SSH in. In practice, the first boot is where all the brittleness hides. If Wi-Fi is not up, if the host key is not the one you expected, or if secret decryption depends on services that themselves depend on decrypted secrets, you do not get a graceful fallback. You get a Pi that never appears on the network.

In this repo, the workflow we ended up trusting is:

nix run .#raspberry-pi-provision-image -- <host> <output-path-or-directory>

That wrapper builds the host’s NixOS sdImage, copies it to the destination you asked for, and injects the bootstrap files the machine needs before its first boot. For bastet, those files are:

  • /var/lib/bootstrap/ssh_host_ed25519_key
  • /var/lib/bootstrap/ssh_host_ed25519_key.pub
  • /var/lib/networkmanager/system-connections.env

Those three files solve the hard part: stable SSH identity and working Wi-Fi from the first power-on, before the device is reachable for any remote deployment step.

The important lesson was not “use this exact script.” It was more general: for headless bootstrap on NixOS, image-time secret injection is often more reliable than trying to decrypt first-boot secrets on the device itself. And if you do inject mutable bootstrap state, it belongs under /var/lib/..., not /etc.

The provisioning goal

The target was simple enough:

  1. Build a NixOS image for a Pi host.
  2. Put that image on removable media.
  3. Power on the Pi somewhere inconvenient.
  4. Have it come up on Wi-Fi immediately with a predictable SSH host key.

That last point matters more than it sounds. If the host key is precomputed, you can pin trust ahead of time instead of accepting a surprise key on first contact. If Wi-Fi credentials are already in place, the Pi can join the network without any keyboard, monitor, serial console, or “just plug it into Ethernet for the first boot” workaround.

For a small headless device, first boot is not the time to discover that your secret-management chain is technically elegant but operationally fragile.

The first idea: decrypt on the Pi at boot

The original design was more declarative on paper.

Keep the image generic. Store Wi-Fi credentials in SOPS. Let the Pi decrypt them on first boot. Use the host’s precomputed SSH key as the age identity. Then let the networking stack consume the decrypted output and bring the machine online.

I still think this is an understandable instinct. It matches the way we want the rest of NixOS to work:

  • secrets live encrypted in the repo
  • the machine decrypts what it needs locally
  • runtime services consume those secrets from known paths

The problem is that on a fresh, headless Pi, this creates a bootstrap chain with too many timing-sensitive links:

  1. The host key has to be present.
  2. SOPS has to use that key successfully.
  3. The decrypted Wi-Fi secret has to land where networking expects it.
  4. Networking has to come up correctly on the first try.
  5. Remote access depends on all of the above already working.

That is not impossible. It is just brittle enough that when it fails, recovery is annoying and usually physical.

This is the same family of problem I wrote about in Solving the NixOS SOPS Bootstrap Problem, but harsher. A server that fails a bootstrap deploy can often still be reached over SSH or through a provider console. A Raspberry Pi on somebody else’s shelf does not usually give you that luxury.

What actually failed

Several separate issues pushed us away from first-boot decryption and toward provisioning-time injection.

Injecting into /etc was the wrong model

The first mistake was treating /etc as a safe place to drop out-of-band bootstrap files into the image.

That works on plenty of conventional Linux setups. On NixOS, it is the wrong mental model. /etc is declarative and system-managed. If you manually inject files into it before boot, you are competing with the activation logic that will happily regenerate or replace parts of it from the system configuration.

That gave us two flavors of breakage:

  • Wi-Fi-related files could disappear before the service that needed them had actually consumed them.
  • The injected SSH host key could exist on the first boot and then fail to persist the way we intended across later boots.

This was the turning point in the design. The problem was not only “how do we get the secret there?” It was “what kind of path is appropriate for mutable bootstrap state on NixOS?”

The answer is not /etc. It is /var/lib/....

SOPS-at-boot was too timing-sensitive

The next problem was the decryption chain itself.

On a running NixOS machine, SOPS-managed secrets are great. On a headless first boot where networking depends on the secret being available immediately, they become part of the boot-critical path. That means every dependency matters:

  • the key material has to be present already
  • the decryption step has to run at the right time
  • the consumer has to wait for the output
  • the output path has to survive activation

If any part of that chain is wrong, the Pi does not partially succeed. It just never joins the network. That is exactly the sort of failure mode you want to design out of a headless device.

In other words, the issue was not that SOPS is bad. The issue was asking first-boot decryption to solve a bootstrap problem that is easier to solve one step earlier, on the provisioning machine.

wpa_supplicant made host-local bootstrap harder than it needed to be

The earliest iterations were more wpa_supplicant-centric, using host-local wireless configuration paths. That worked in the sense that it was possible to make it go, but it pushed more host-specific logic into the image build than I wanted, and it was not especially pleasant to reason about.

We eventually moved the shared Raspberry Pi path to NetworkManager, which made the image behavior more uniform and the bootstrap layout simpler. That let us inject one runtime file:

/var/lib/networkmanager/system-connections.env

and have the image consume it from a persistent runtime location instead of trying to smuggle mutable first-boot state through a declarative /etc path.

This is not an argument that wpa_supplicant is wrong in general. It is an argument that, in this setup, NetworkManager was easier to make boring. And boring is exactly what you want from bootstrap networking.

The provisioner initially lied to us

This one was especially sneaky.

An early version of the image wrapper reported success even when the debugfs writes had failed, because the directories inside the ext4 image did not exist yet. So the script looked happy, the image looked provisioned, and the Pi still came up missing the files we thought we had injected.

That is the worst kind of automation bug: a false positive.

We fixed it by making the provisioner much less trusting:

  • create the destination directories inside the image explicitly
  • write the files only after those directories exist
  • assert that the injected files are present afterward
  • verify the finished image contents directly

Once the script started failing hard instead of cheerfully pretending, the whole flow became much easier to trust.

The design we kept

The reliable version of the workflow is much simpler:

  1. Keep the source secrets in SOPS in the repo.
  2. Decrypt them on the provisioning machine, where the age identity already exists.
  3. Build the host’s sdImage.
  4. Inject the runtime files directly into the image before first boot.
  5. Store those files under persistent runtime paths in /var/lib/....
  6. Point services at those paths directly.

For this host, that means:

  • OpenSSH uses /var/lib/bootstrap/ssh_host_ed25519_key
  • NetworkManager reads /var/lib/networkmanager/system-connections.env

That design is slightly less symmetrical than “the machine decrypts everything itself at boot,” but it is much more reliable in the only moment that really matters: the first boot when the device is not yet reachable.

Why /var/lib is the real lesson

The larger design lesson here is not specific to Raspberry Pis.

If a file is mutable runtime state, or bootstrap state that should survive independently of declarative /etc generation, it probably belongs somewhere under /var/lib. On NixOS, this is not just a filesystem preference. It is a boundary between system-managed declarative configuration and persistent machine-local state.

Once I started looking at the problem through that lens, the right shape became obvious:

  • /etc is for declarative configuration rendered by the system
  • /var/lib is for persistent mutable state owned by services or bootstrap tooling

Trying to force bootstrap secrets into /etc was really a category error.

The tradeoff

The compromise is that the Wi-Fi secret is currently treated as day-0 bootstrap state.

A plain remote nixos-rebuild later does not automatically rotate the injected Wi-Fi credential unless we add a second runtime-managed secret update path for post-bootstrap changes. So yes, the current setup gives up some declarative neatness.

I think that is the correct trade.

Wi-Fi PSKs do not change often. First-boot connectivity absolutely has to work. Once the Pi is online and reachable, every other kind of refinement becomes easy again. The hard problem is not long-term secret rotation. The hard problem is getting a tiny headless box to show up reliably the first time.

If the price of that reliability is that bootstrap networking is provisioned one step earlier, on the machine building the image, I will happily pay it.

The practical workflow

For a new managed Raspberry Pi host, the workflow now looks like this:

  1. Add the host and image target.
  2. Put wifi_ssid, wifi_psk, and the precomputed SSH host key in the host’s SOPS file.
  3. Run:
nix run .#raspberry-pi-provision-image -- <host> <destination>
  1. Flash the resulting image.
  2. Boot the Pi.
  3. Verify that it is reachable and exporting what you expect:
ssh-keyscan <ip>
upsc -l
curl http://<ip>:9199/ups_metrics

That last verification block is intentionally concrete. When the machine is a small appliance-style host, “can I SSH in?” is necessary but not sufficient. If it is supposed to be handling UPS monitoring or some other edge function, check the real service path immediately while the provisioning details are still fresh in your head.

Closing thought

I started out wanting a more elegant first-boot story. What I ended up wanting more was one that worked every time.

For headless Raspberry Pi provisioning on NixOS, image-time secret injection turned out to be the pragmatic answer. Decrypt on the provisioning machine. Inject only the runtime files needed for first boot. Put them under /var/lib/..., not /etc. Let the first boot be boring.

That is not the most purely declarative design. It is, however, the one I would trust before mailing a Pi to another building and hoping it comes back on Wi-Fi.