<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="/feed.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
    <title>Ian Johannesen</title>
    <subtitle>Systems Engineer, SRE&#x2F;DevOps, IP Engineer</subtitle>
    <link rel="self" type="application/atom+xml" href="https://perlpimp.net/atom.xml"/>
    <link rel="alternate" type="text/html" href="https://perlpimp.net"/>
    <generator uri="https://www.getzola.org/">Zola</generator>
    <updated>2026-04-16T15:30:00+00:00</updated>
    <id>https://perlpimp.net/atom.xml</id>
    <entry xml:lang="en">
        <title>Visual State Engines for Opaque UIs</title>
        <published>2026-04-16T15:30:00+00:00</published>
        <updated>2026-04-16T15:30:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/visual-state-engines-opaque-uis/"/>
        <id>https://perlpimp.net/blog/visual-state-engines-opaque-uis/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/visual-state-engines-opaque-uis/">&lt;p&gt;The awkward thing about automating some modern browser applications is that, after a certain point, the browser stops being the application.&lt;&#x2F;p&gt;
&lt;p&gt;It is still there. You can still drive Chromium. You can still click buttons, wait for selectors, fill forms, and listen to network requests. But once the real interface has loaded into a canvas, a remote desktop stream, a game engine, a video surface, or a heavily custom renderer, the useful application state may no longer exist in the DOM in any meaningful way.&lt;&#x2F;p&gt;
&lt;p&gt;There is no selector for “the correct tab is active”. There is no structured event for “the list advanced after that click”. There is no API response that tells you whether the modal you care about appeared. There is just a rectangle of pixels.&lt;&#x2F;p&gt;
&lt;p&gt;That changes the shape of the automation problem.&lt;&#x2F;p&gt;
&lt;p&gt;This is not conventional DOM automation. It is closer to writing a state machine whose sensors are screenshots.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-wrong-shape&quot;&gt;The Wrong Shape&lt;&#x2F;h2&gt;
&lt;p&gt;The tempting version of this kind of automation is a long imperative script:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;open browser
log in
click the navigation item
click the target tab
loop:
  screenshot the row
  click the action button
  wait
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That version is attractive because it matches how a human would describe the job. It is also fragile in exactly the way pixel-driven automation tends to be fragile. If login triggers a challenge, the script is lost. If an unexpected popup appears over the canvas, the script keeps clicking underneath it. If the click lands but the UI does not advance, the script has no idea whether the application is slow, the coordinate is wrong, or the screen is no longer where it thinks it is.&lt;&#x2F;p&gt;
&lt;p&gt;The working shape is usually not one big script. It is layered state engines.&lt;&#x2F;p&gt;
&lt;p&gt;One layer prepares the browser and account session. It knows about browser launch, navigation, login, consent banners, challenges, blocking popups, and whether the rendered app is ready. Another layer owns the opaque UI itself: canvas clicks, visual anchors, active tabs, list rows, button targeting, popup detection, and proof that a state transition happened. A third layer turns the whole thing into a runtime: scheduling, retries, safe stop&#x2F;start, timeline events, artifacts, metrics, and operational recovery.&lt;&#x2F;p&gt;
&lt;p&gt;That separation matters. Browser preparation, visual navigation, and autonomous operation fail in different ways. If they are collapsed into one linear script, every failure looks like “the bot broke”. If they are separate state engines, each layer gets its own vocabulary for success, blockage, evidence, and recovery.&lt;&#x2F;p&gt;
&lt;p&gt;In practice, this means every step should answer four questions:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;What action am I about to take?&lt;&#x2F;li&gt;
&lt;li&gt;What evidence proves it worked?&lt;&#x2F;li&gt;
&lt;li&gt;What evidence means I am blocked?&lt;&#x2F;li&gt;
&lt;li&gt;What artifacts should I keep if I am wrong?&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Without those answers, the automation is mostly hoping.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;pixels-as-state&quot;&gt;Pixels as State&lt;&#x2F;h2&gt;
&lt;p&gt;Once the useful interface is opaque, screenshots become the API.&lt;&#x2F;p&gt;
&lt;p&gt;The system has to infer state from evidence like this:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Is the target panel visible?&lt;&#x2F;li&gt;
&lt;li&gt;Which tab or mode is active?&lt;&#x2F;li&gt;
&lt;li&gt;Is the list empty?&lt;&#x2F;li&gt;
&lt;li&gt;Where are the visible row bands?&lt;&#x2F;li&gt;
&lt;li&gt;Where is the action button inside the current row?&lt;&#x2F;li&gt;
&lt;li&gt;Did the screen change after the click?&lt;&#x2F;li&gt;
&lt;li&gt;Did a confirmation popup appear?&lt;&#x2F;li&gt;
&lt;li&gt;Does the next captured row match what the previous click should have produced?&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;None of those are DOM questions. They are image questions.&lt;&#x2F;p&gt;
&lt;p&gt;The robust version is dynamic scanning. You crop meaningful regions and detect row bands by projection activity. You search for colored connected components that look like buttons. You compare before and after crops with visual change scores. You hash regions and compare signatures. You detect active tabs, empty states, and confirmation popups from image evidence.&lt;&#x2F;p&gt;
&lt;p&gt;This is resilient because it keeps asking the screen what actually happened. If a row is a few pixels taller than expected, dynamic row detection can still find it. If a button shifts, component detection can target the button instead of trusting a stale coordinate. If a click is swallowed, the before&#x2F;after probes can say “no meaningful state change happened” instead of blindly moving on.&lt;&#x2F;p&gt;
&lt;p&gt;But dynamic scanning has a cost.&lt;&#x2F;p&gt;
&lt;p&gt;Every broad screenshot has to be captured by the browser, encoded, transported back to the automation process, decoded, cropped, scanned, hashed, and compared. In the hot path, that cost compounds quickly. The expensive part is not only the image algorithm. It is the screenshot pipeline around the algorithm.&lt;&#x2F;p&gt;
&lt;p&gt;That is easy to underestimate. You look at a helper called something like &lt;code&gt;visual_change_score&lt;&#x2F;code&gt; and think about CPU. But the real loop may be doing repeated screenshot captures, waiting between probes, and decoding images just to decide whether one click worked.&lt;&#x2F;p&gt;
&lt;p&gt;For a one-off debugging run, that may be fine. For a runtime that needs to repeat the same operation hundreds or thousands of times, it is the difference between a plausible service and a sluggish science project.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;calibrated-pixels&quot;&gt;Calibrated Pixels&lt;&#x2F;h2&gt;
&lt;p&gt;The opposite approach is calibrated pixels.&lt;&#x2F;p&gt;
&lt;p&gt;Instead of searching the whole UI for a known tab, use a pinned click point. Instead of inferring active state from a broad region, sample a tiny swatch inside each tab body. Instead of rediscovering list geometry every time, configure the first row crop and row pitch. Model the coordinates against a known viewport and scale them to the runtime screenshot size.&lt;&#x2F;p&gt;
&lt;p&gt;This feels wrong at first. Hardcoded pixels are the sort of thing people make fun of in automation code, often for good reason. They are brittle. They depend on viewport, scale, layout, language, theme, and whatever the product designers changed this week.&lt;&#x2F;p&gt;
&lt;p&gt;But calibrated pixels are also extremely fast.&lt;&#x2F;p&gt;
&lt;p&gt;A fixed click point does not need image search. A tiny tab swatch is cheaper than analyzing a broad tab body. A known first-row crop is cheaper than rediscovering every candidate row. If the hot path is “open the next row and prove that it advanced”, small probes beat wide scans.&lt;&#x2F;p&gt;
&lt;p&gt;The trick is to make the brittleness operational rather than mysterious.&lt;&#x2F;p&gt;
&lt;p&gt;Calibrated systems need:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;a controlled viewport model;&lt;&#x2F;li&gt;
&lt;li&gt;explicit coordinate and region configuration;&lt;&#x2F;li&gt;
&lt;li&gt;debug artifacts that show what the automation saw;&lt;&#x2F;li&gt;
&lt;li&gt;timing metadata for slow or ambiguous transitions;&lt;&#x2F;li&gt;
&lt;li&gt;a fast recalibration path when the UI shifts.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That makes a stale coordinate a normal maintenance event instead of a haunted failure. The coordinate is still brittle, but the system tells you which visual assumption broke and gives you the evidence needed to fix it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-useful-compromise&quot;&gt;The Useful Compromise&lt;&#x2F;h2&gt;
&lt;p&gt;The design I trust most is the hybrid.&lt;&#x2F;p&gt;
&lt;p&gt;Use calibrated coordinates for the hot path, but do not blindly trust them. A click is only useful if a small amount of image evidence confirms the transition.&lt;&#x2F;p&gt;
&lt;p&gt;For tab switching, a primary and fallback click point can replace broad candidate search. After the click, tiny swatches can confirm which tab is active. If the swatch evidence is ambiguous, a broader region comparison can still act as fallback validation.&lt;&#x2F;p&gt;
&lt;p&gt;For row-oriented flows, the system can capture the row before taking action, choose a click point, dispatch the click, and then wait for proof. That proof might come from a confirmation popup, a list-region change, a row-identity comparison, or a hash match against the row that should have shifted into place.&lt;&#x2F;p&gt;
&lt;p&gt;There is an especially practical optimization here: avoid taking separate browser screenshots for every proof crop. Compute one enclosing checkpoint region, capture it once, and crop the smaller regions in memory. That keeps the evidence model but reduces screenshot round-trips.&lt;&#x2F;p&gt;
&lt;p&gt;This is the core engineering lesson: do the minimum visual work that can prove the transition.&lt;&#x2F;p&gt;
&lt;p&gt;Not the minimum work that can probably get away with it. Not the maximum work that makes the system feel clever. The minimum work that can prove the state transition you are about to depend on.&lt;&#x2F;p&gt;
&lt;p&gt;In a good hybrid system:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;fixed coordinates handle the common path;&lt;&#x2F;li&gt;
&lt;li&gt;tiny swatches confirm cheap state;&lt;&#x2F;li&gt;
&lt;li&gt;cropped probes prove transitions;&lt;&#x2F;li&gt;
&lt;li&gt;dynamic scanning remains available where drift is likely;&lt;&#x2F;li&gt;
&lt;li&gt;artifacts and timing metadata make recalibration routine.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Fast automation without artifacts is a future debugging trap. Debuggable automation can afford to be more aggressive, because when it fails you can see what it thought it saw.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;screenshot-format-is-part-of-the-algorithm&quot;&gt;Screenshot Format Is Part of the Algorithm&lt;&#x2F;h2&gt;
&lt;p&gt;One of the easiest performance lessons to miss is that screenshot format matters.&lt;&#x2F;p&gt;
&lt;p&gt;It is natural to start with PNG files. PNG is lossless, and if cropped images eventually feed OCR or visual matching, you do not want compression artifacts in the data you are trying to read.&lt;&#x2F;p&gt;
&lt;p&gt;But screenshots in the hot loop are not just data. They are transport. The browser has to encode them. The automation process has to receive them. The runtime has to decode them. When you do that repeatedly, the image format becomes part of the algorithm.&lt;&#x2F;p&gt;
&lt;p&gt;In one recent visual automation loop, moving browser screenshots from PNG to JPEG made each screenshot roughly twice as fast, dropping from about 100 ms to about 50 ms. That is not a micro-optimization when screenshot capture sits inside every click-confirmation cycle. It changes the feel of the whole state engine.&lt;&#x2F;p&gt;
&lt;p&gt;That does not mean “always use JPEG”. OCR crops, audit artifacts, and failure snapshots may still deserve lossless PNG. The useful distinction is between image artifacts as records and screenshots as sensor reads. Records optimize for fidelity and inspectability. Sensor reads also have to optimize for latency.&lt;&#x2F;p&gt;
&lt;p&gt;If screenshots are in your inner loop, encoding and transport are in your inner loop too.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;runtime-beats-cleverness&quot;&gt;Runtime Beats Cleverness&lt;&#x2F;h2&gt;
&lt;p&gt;The runtime layer is less glamorous than the vision logic, but it is what makes the system useful.&lt;&#x2F;p&gt;
&lt;p&gt;An autonomous visual collector needs more than a successful happy path. It needs to know when to start. It needs to retry after failure without manual intervention. It needs to stop safely in the middle of a run. It needs to leave behind enough timeline events to explain what happened. It needs to persist partial evidence before taking destructive or irreversible actions, because after a successful click the thing you meant to inspect may be gone.&lt;&#x2F;p&gt;
&lt;p&gt;That last point is important. In visual automation, the evidence often exists only before the action. Capture first, act second, confirm third. If the run dies after the click, the pre-action evidence should not be lost. If the next state is ambiguous, the runtime should be able to mark it for review instead of pretending the input was perfect.&lt;&#x2F;p&gt;
&lt;p&gt;The runtime should also measure itself:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;average operation cycle time;&lt;&#x2F;li&gt;
&lt;li&gt;average post-click confirmation time;&lt;&#x2F;li&gt;
&lt;li&gt;screenshot capture latency;&lt;&#x2F;li&gt;
&lt;li&gt;retry counts;&lt;&#x2F;li&gt;
&lt;li&gt;fallback path usage;&lt;&#x2F;li&gt;
&lt;li&gt;ambiguous-state rate;&lt;&#x2F;li&gt;
&lt;li&gt;artifact volume;&lt;&#x2F;li&gt;
&lt;li&gt;estimated throughput under current timing.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Those numbers are not vanity metrics. They are how you decide whether a change actually made the system faster or merely moved the waiting around.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;where-this-pattern-applies&quot;&gt;Where This Pattern Applies&lt;&#x2F;h2&gt;
&lt;p&gt;This pattern is not specific to games. It shows up anywhere the useful state is rendered but not exposed:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;browser apps that hide their real interface in a canvas;&lt;&#x2F;li&gt;
&lt;li&gt;remote desktop or VNC automation;&lt;&#x2F;li&gt;
&lt;li&gt;streamed enterprise software;&lt;&#x2F;li&gt;
&lt;li&gt;kiosk-style web views;&lt;&#x2F;li&gt;
&lt;li&gt;legacy applications with poor accessibility hooks;&lt;&#x2F;li&gt;
&lt;li&gt;visual QA systems;&lt;&#x2F;li&gt;
&lt;li&gt;hardware dashboards rendered through video capture;&lt;&#x2F;li&gt;
&lt;li&gt;any workflow where the only reliable source of truth is the screen.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;All of these systems have the same fork in the road. You can scan dynamically and pay for robustness. You can calibrate pixels and pay for brittleness. Or you can build a hybrid where fixed coordinates drive the hot path and small probes prove that each transition happened.&lt;&#x2F;p&gt;
&lt;p&gt;Reach for dynamic scanning when:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the UI moves often;&lt;&#x2F;li&gt;
&lt;li&gt;the action is rare enough that performance does not matter much;&lt;&#x2F;li&gt;
&lt;li&gt;the cost of a false click is high;&lt;&#x2F;li&gt;
&lt;li&gt;you do not yet know the stable geometry.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Reach for calibrated pixels when:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the same interaction happens many times;&lt;&#x2F;li&gt;
&lt;li&gt;the viewport can be controlled;&lt;&#x2F;li&gt;
&lt;li&gt;there is a cheap visual confirmation after the click;&lt;&#x2F;li&gt;
&lt;li&gt;failures produce artifacts that make recalibration straightforward.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The practical design usually lands in the middle. It is not a pure computer-vision system. It is not a pile of magic coordinates. It is a service built around visual state: explicit steps, visual anchors, runtime control, calibrated hot-path inputs, and enough image evidence to keep the whole thing honest.&lt;&#x2F;p&gt;
&lt;p&gt;When the application state only exists as pixels, do not pretend you are still automating the DOM. Build the state machine where the state actually is.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Efficient NixOS Remote Deploys with Selective Closure Copying</title>
        <published>2026-04-15T11:00:00+00:00</published>
        <updated>2026-04-15T11:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/nixos-remote-deploys-selective-closure-copying/"/>
        <id>https://perlpimp.net/blog/nixos-remote-deploys-selective-closure-copying/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/nixos-remote-deploys-selective-closure-copying/">&lt;p&gt;You run &lt;code&gt;nixos-rebuild switch --target-host&lt;&#x2F;code&gt;, wait for the build to finish, and then watch your machine upload what feels like the entire internet to a server that could have fetched most of those paths from &lt;code&gt;cache.nixos.org&lt;&#x2F;code&gt; on its own.&lt;&#x2F;p&gt;
&lt;p&gt;This is one of those Nix deployment problems that looks inevitable until you learn the flag.&lt;&#x2F;p&gt;
&lt;p&gt;If you are deploying to a remote NixOS host, the first thing to try is:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nixos-rebuild switch \
  --flake &amp;quot;.#myhost&amp;quot; \
  --target-host deploy@example.com \
  --use-remote-sudo \
  --use-substitutes
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That single &lt;code&gt;--use-substitutes&lt;&#x2F;code&gt; changes the copy behavior in a way that can make deploys dramatically faster. Instead of blindly pushing the full closure from your laptop or CI runner, the remote host gets a chance to fetch missing store paths from its own configured substituters first. In practice, that usually means the remote pulls public dependencies from &lt;code&gt;cache.nixos.org&lt;&#x2F;code&gt;, and you only upload the paths that are actually private or unavailable in cache.&lt;&#x2F;p&gt;
&lt;p&gt;That is the whole trick.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-default-behavior-is-wasteful&quot;&gt;The default behavior is wasteful&lt;&#x2F;h2&gt;
&lt;p&gt;Without substitute-on-destination behavior, a remote deploy often looks like this:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;You build locally&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;nixos-rebuild&lt;&#x2F;code&gt; copies the resulting closure to the target host&lt;&#x2F;li&gt;
&lt;li&gt;The target activates the new system&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The problem is step two. Your local machine ends up pushing standard &lt;code&gt;nixpkgs&lt;&#x2F;code&gt; dependencies that the remote could have downloaded perfectly well on its own from a fast public cache.&lt;&#x2F;p&gt;
&lt;p&gt;So if your system closure includes &lt;code&gt;systemd&lt;&#x2F;code&gt;, &lt;code&gt;nginx&lt;&#x2F;code&gt;, &lt;code&gt;openssl&lt;&#x2F;code&gt;, &lt;code&gt;glibc&lt;&#x2F;code&gt;, and a hundred other ordinary paths, you are pointlessly using your own outbound bandwidth as a binary cache.&lt;&#x2F;p&gt;
&lt;p&gt;That hurts most on:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;home uplinks with weak upload bandwidth&lt;&#x2F;li&gt;
&lt;li&gt;remote VPS deploys over higher-latency links&lt;&#x2F;li&gt;
&lt;li&gt;CI runners pushing to many hosts&lt;&#x2F;li&gt;
&lt;li&gt;large closures where only a tiny fraction is actually custom&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;what-use-substitutes-actually-does&quot;&gt;What &lt;code&gt;--use-substitutes&lt;&#x2F;code&gt; actually does&lt;&#x2F;h2&gt;
&lt;p&gt;The high-level flag is &lt;code&gt;--use-substitutes&lt;&#x2F;code&gt; on &lt;code&gt;nixos-rebuild&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The lower-level behavior underneath it is &lt;code&gt;nix copy --substitute-on-destination&lt;&#x2F;code&gt;. The Nix reference manual describes that flag as telling the destination SSH store to try substitutes on the destination side. That is the mechanism you want.&lt;&#x2F;p&gt;
&lt;p&gt;So conceptually, you are switching from this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;my machine -&amp;gt; push everything -&amp;gt; remote host
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;to this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;my machine -&amp;gt; push only uncached paths -&amp;gt; remote host
remote host -&amp;gt; fetch public paths -&amp;gt; cache.nixos.org &amp;#x2F; other substituters
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is almost always the better network topology.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-one-flag-version&quot;&gt;The one-flag version&lt;&#x2F;h2&gt;
&lt;p&gt;For many setups, the simple version is enough:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run nixpkgs#nixos-rebuild -- switch \
  --flake &amp;quot;.#myhost&amp;quot; \
  --target-host deploy@example.com \
  --use-remote-sudo \
  --use-substitutes
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That keeps your deploy wrapper short and gets you most of the benefit immediately.&lt;&#x2F;p&gt;
&lt;p&gt;If your current deploy script looks like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run nixpkgs#nixos-rebuild -- switch \
  --flake &amp;quot;.#myhost&amp;quot; \
  --target-host deploy@example.com \
  --use-remote-sudo
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;the diff is almost embarrassingly small:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;diff&quot; class=&quot;language-diff &quot;&gt;&lt;code class=&quot;language-diff&quot; data-lang=&quot;diff&quot;&gt; nix run nixpkgs#nixos-rebuild -- switch \
   --flake &amp;quot;.#myhost&amp;quot; \
   --target-host deploy@example.com \
-  --use-remote-sudo
+  --use-remote-sudo \
+  --use-substitutes
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And yet that tiny diff can be the difference between “ship the whole closure over SSH” and “let the remote fetch the boring stuff itself.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-trusted-user-gotcha&quot;&gt;The trusted-user gotcha&lt;&#x2F;h2&gt;
&lt;p&gt;This is the part many writeups skip.&lt;&#x2F;p&gt;
&lt;p&gt;If you deploy as a non-root user on the remote machine, that user needs to be trusted by the remote Nix daemon. Otherwise the substitute request can be ignored and you quietly fall back to the slow path.&lt;&#x2F;p&gt;
&lt;p&gt;On NixOS, that usually means:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  nix.settings.trusted-users = [ &amp;quot;root&amp;quot; &amp;quot;deploy&amp;quot; ];
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If you deploy as &lt;code&gt;root@host&lt;&#x2F;code&gt;, root is already trusted. If you deploy as &lt;code&gt;deploy@host&lt;&#x2F;code&gt; and rely on &lt;code&gt;--use-remote-sudo&lt;&#x2F;code&gt; for activation, then &lt;code&gt;deploy&lt;&#x2F;code&gt; needs to be listed in &lt;code&gt;trusted-users&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;This matches the Nix documentation more broadly: using substituters through the daemon is a privileged capability, and the calling user needs to be trusted for it to work as intended.&lt;&#x2F;p&gt;
&lt;p&gt;If you forget this step, the failure mode is annoying because it often looks like nothing is wrong. The deploy still succeeds. It is just slow, and the target host behaves as though &lt;code&gt;--use-substitutes&lt;&#x2F;code&gt; was never there.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-to-tell-whether-it-is-working&quot;&gt;How to tell whether it is working&lt;&#x2F;h2&gt;
&lt;p&gt;If your deploys are still dragging, check the basics:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# On the remote host
nix show-config | grep trusted-users
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And pay attention to the copy phase. The fast path looks like the target is fetching public dependencies for itself. The slow path looks like one long stream of data being shoved from your local store to the remote store.&lt;&#x2F;p&gt;
&lt;p&gt;The easiest sanity check is often practical: if adding &lt;code&gt;--use-substitutes&lt;&#x2F;code&gt; did not reduce upload volume or deploy time, the trusted-user piece is the first thing to verify.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-explicit-three-step-version&quot;&gt;The explicit three-step version&lt;&#x2F;h2&gt;
&lt;p&gt;Sometimes you want more control than &lt;code&gt;nixos-rebuild&lt;&#x2F;code&gt; gives you in one command. Maybe you want to pre-stage a system closure, inspect copy behavior, or separate build time from transfer time in your logs.&lt;&#x2F;p&gt;
&lt;p&gt;That is where the lower-level pattern is useful:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;TOPLEVEL=$(nix build &amp;quot;.#nixosConfigurations.myhost.config.system.build.toplevel&amp;quot; \
  --print-out-paths \
  --no-link)

nix copy \
  --to &amp;quot;ssh-ng:&amp;#x2F;&amp;#x2F;deploy@example.com&amp;quot; \
  --substitute-on-destination \
  &amp;quot;$TOPLEVEL&amp;quot;

nixos-rebuild switch \
  --flake &amp;quot;.#myhost&amp;quot; \
  --target-host deploy@example.com \
  --use-remote-sudo
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This gives you three distinct phases:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Build the target system locally&lt;&#x2F;li&gt;
&lt;li&gt;Copy the closure while allowing the destination to substitute what it can&lt;&#x2F;li&gt;
&lt;li&gt;Switch the remote system&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That is handy when you are debugging performance, staging to multiple machines, or just want a clearer operational story than “one giant command did a lot of things.”&lt;&#x2F;p&gt;
&lt;p&gt;You can also dry-run the copy phase:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix copy \
  --to &amp;quot;ssh-ng:&amp;#x2F;&amp;#x2F;deploy@example.com&amp;quot; \
  --substitute-on-destination \
  --dry-run \
  &amp;quot;$TOPLEVEL&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is a nice way to see whether you are really pushing only the private or uncached store paths.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-this-helps-most&quot;&gt;When this helps most&lt;&#x2F;h2&gt;
&lt;p&gt;This pattern is especially effective when your closure contains a small amount of private work layered on top of a lot of standard public dependencies.&lt;&#x2F;p&gt;
&lt;p&gt;Examples:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;a mostly normal NixOS system plus a private application package&lt;&#x2F;li&gt;
&lt;li&gt;a public service stack plus one custom overlay&lt;&#x2F;li&gt;
&lt;li&gt;a host whose secrets are handled separately but whose system packages are mostly upstream&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;In those cases, letting the destination substitute public paths turns deploys from “upload gigabytes” into “ship the weird bits.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-it-does-not-help-much&quot;&gt;When it does not help much&lt;&#x2F;h2&gt;
&lt;p&gt;There are real limits.&lt;&#x2F;p&gt;
&lt;p&gt;If most of your closure is custom and uncached, then the remote still has to get that data from somewhere, and that somewhere is probably you. &lt;code&gt;--use-substitutes&lt;&#x2F;code&gt; does not magically make private builds public.&lt;&#x2F;p&gt;
&lt;p&gt;It also will not help if the remote host cannot reach its substituters, or if your cache configuration is incomplete.&lt;&#x2F;p&gt;
&lt;p&gt;So the pattern is best understood as:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;use destination-side substitutes for public paths&lt;&#x2F;li&gt;
&lt;li&gt;use your own upload bandwidth for the private leftovers&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is still a huge win, just not infinite.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;pair-it-with-a-private-cache&quot;&gt;Pair it with a private cache&lt;&#x2F;h2&gt;
&lt;p&gt;If you want the best version of this workflow, pair it with your own binary cache as well.&lt;&#x2F;p&gt;
&lt;p&gt;For example:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  nix.settings = {
    trusted-users = [ &amp;quot;root&amp;quot; &amp;quot;deploy&amp;quot; ];
    substituters = [
      &amp;quot;https:&amp;#x2F;&amp;#x2F;cache.nixos.org&amp;quot;
      &amp;quot;https:&amp;#x2F;&amp;#x2F;your-attic.example.com&amp;#x2F;main&amp;quot;
    ];
    trusted-public-keys = [
      &amp;quot;cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY=&amp;quot;
      &amp;quot;main:your-attic-public-key-here&amp;quot;
    ];
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now the remote can fetch both public dependencies and your private builds without waiting for a direct upload from the deployer. That is the point where remote deploys start feeling unfairly fast.&lt;&#x2F;p&gt;
&lt;p&gt;If you want to set that up yourself, I already wrote about &lt;a href=&quot;&#x2F;blog&#x2F;self-hosted-nix-binary-cache-attic-garage&#x2F;&quot;&gt;self-hosting a Nix binary cache with Attic and Garage&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-practical-rule&quot;&gt;The practical rule&lt;&#x2F;h2&gt;
&lt;p&gt;If you deploy to a remote NixOS host, add &lt;code&gt;--use-substitutes&lt;&#x2F;code&gt; first and verify that the remote deploy user is trusted.&lt;&#x2F;p&gt;
&lt;p&gt;That is the highest-leverage fix.&lt;&#x2F;p&gt;
&lt;p&gt;Once that is in place, move to the explicit &lt;code&gt;nix build&lt;&#x2F;code&gt; plus &lt;code&gt;nix copy --substitute-on-destination&lt;&#x2F;code&gt; pattern when you need more visibility or control. And if you deploy often, add a private cache so the “custom bits” stop being uploads too.&lt;&#x2F;p&gt;
&lt;p&gt;You do not need to keep re-uploading the world to machines that already know how to fetch it.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>NUR — The Distributed Package Registry Nix Deserves</title>
        <published>2026-04-14T18:00:00+00:00</published>
        <updated>2026-04-14T18:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/nur-distributed-package-registry-nix/"/>
        <id>https://perlpimp.net/blog/nur-distributed-package-registry-nix/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/nur-distributed-package-registry-nix/">&lt;p&gt;If you come from Arch, the AUR feels like the obvious model for community packaging: one big shared repository, everyone submits PKGBUILDs, and the collective package count turns into a kind of ecosystem scoreboard.&lt;&#x2F;p&gt;
&lt;p&gt;NUR takes a different path, and I think it is the better one.&lt;&#x2F;p&gt;
&lt;p&gt;NUR is not a monorepo of user packages. It is a registry of user repositories. Each contributor owns their own repo, publishes whatever packages, modules, and overlays they want, and NUR indexes the result. At the time of writing, the &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;nur.nix-community.org&#x2F;&quot;&gt;NUR package index&lt;&#x2F;a&gt; shows 5,944 packages across 369 user repositories. That is a lot of ecosystem hiding in plain sight.&lt;&#x2F;p&gt;
&lt;p&gt;If you have not explored it yet, you are probably underestimating what the Nix community has already packaged.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-nur-actually-is&quot;&gt;What NUR actually is&lt;&#x2F;h2&gt;
&lt;p&gt;The important mental model is this:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;AUR&lt;&#x2F;strong&gt; is a shared submission queue&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;NUR&lt;&#x2F;strong&gt; is a federated directory&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That difference sounds subtle until you use it.&lt;&#x2F;p&gt;
&lt;p&gt;With AUR, the package lives inside the central repo. With NUR, the package lives in the maintainer’s repo, and NUR points you to it. The registry is the index, not the home.&lt;&#x2F;p&gt;
&lt;p&gt;That makes NUR feel much more like the rest of the Nix ecosystem. You are already used to composing inputs, pinning revisions, wiring overlays together, and deciding exactly what enters your system closure. NUR fits that model naturally.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-this-architecture-is-better&quot;&gt;Why this architecture is better&lt;&#x2F;h2&gt;
&lt;p&gt;The simplest comparison looks like this:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;AUR&lt;&#x2F;th&gt;&lt;th&gt;NUR&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;One monolithic community repo&lt;&#x2F;td&gt;&lt;td&gt;Many independent repos collected in one index&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Submit PKGBUILDs into the shared tree&lt;&#x2F;td&gt;&lt;td&gt;Publish in your own repo&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Centralized packaging workflow&lt;&#x2F;td&gt;&lt;td&gt;Federated packaging workflow&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Mostly package recipes&lt;&#x2F;td&gt;&lt;td&gt;Packages, overlays, modules, and repo-specific conventions&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Trust is broad and social&lt;&#x2F;td&gt;&lt;td&gt;Trust can be selective and explicit&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;That last point matters more than it first appears.&lt;&#x2F;p&gt;
&lt;p&gt;In NUR, you do not have to think in all-or-nothing terms. You can consume the whole registry via the NUR overlay if you want the giant package namespace, but you can also pull in one maintainer’s repo directly, pin it to a commit, and use only the parts you care about. That is a better fit for how serious Nix users already manage risk and reproducibility.&lt;&#x2F;p&gt;
&lt;p&gt;It is not that AUR is bad. The AUR is one of Arch’s best ideas. But NUR solves the same community-packaging problem in a way that feels much more native to Nix.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;discoverability-is-much-better-than-people-assume&quot;&gt;Discoverability is much better than people assume&lt;&#x2F;h2&gt;
&lt;p&gt;One of the classic arguments for a monorepo is discoverability: if everything lives in one place, surely it is easier to find.&lt;&#x2F;p&gt;
&lt;p&gt;But NUR already gives you the searchable index people actually need. &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;nur.nix-community.org&#x2F;&quot;&gt;nur.nix-community.org&lt;&#x2F;a&gt; lets you search across the full registry, browse repositories, and inspect what maintainers are publishing. You get the discovery benefits of aggregation without forcing everyone into one git history and one submission model.&lt;&#x2F;p&gt;
&lt;p&gt;That turns out to be a very good trade.&lt;&#x2F;p&gt;
&lt;p&gt;You can stumble across exactly the kind of wonderfully niche packaging work that thrives in Nix:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Prometheus exporters that are too specific for &lt;code&gt;nixpkgs&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Firefox extension collections packaged as reproducible derivations&lt;&#x2F;li&gt;
&lt;li&gt;SDDM themes that become one declarative option instead of manual file copying&lt;&#x2F;li&gt;
&lt;li&gt;Small infrastructure tools, IRC bots, relay daemons, and odd little utilities that absolutely belong somewhere even if they are not mainstream enough for &lt;code&gt;nixpkgs&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is where NUR shines. It is the long tail of the Nix ecosystem, but with structure.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;it-is-not-just-packages&quot;&gt;It is not just packages&lt;&#x2F;h2&gt;
&lt;p&gt;This is the real differentiator.&lt;&#x2F;p&gt;
&lt;p&gt;AUR gives you install recipes. NUR repos can give you full Nix building blocks: packages, overlays, NixOS modules, Home Manager modules, helper libraries, and flake outputs.&lt;&#x2F;p&gt;
&lt;p&gt;That means the thing you discover is often not just “here is a binary.” It is “here is a binary, plus a module, plus sane service wiring, plus options, plus a reusable overlay if you want to patch it.”&lt;&#x2F;p&gt;
&lt;p&gt;That is a much richer packaging story.&lt;&#x2F;p&gt;
&lt;p&gt;For example, if you consume a NUR repo directly as a flake input, you can wire its modules straight into your system:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs = {
    nixpkgs.url = &amp;quot;github:NixOS&amp;#x2F;nixpkgs&amp;#x2F;nixos-unstable&amp;quot;;
    nur-ijohanne.url = &amp;quot;github:ijohanne&amp;#x2F;nur-packages&amp;quot;;
  };

  outputs = { nixpkgs, nur-ijohanne, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      system = &amp;quot;x86_64-linux&amp;quot;;
      modules = [
        nur-ijohanne.nixosModules.prometheus-hue-exporter
        {
          services.prometheus-hue-exporter.enable = true;
        }
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And if you want the big shared namespace instead, the upstream NUR overlay works too:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs.nur.url = &amp;quot;github:nix-community&amp;#x2F;NUR&amp;quot;;

  outputs = { nixpkgs, nur, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      system = &amp;quot;x86_64-linux&amp;quot;;
      modules = [
        nur.modules.nixos.default
        ({ pkgs, ... }: {
          environment.systemPackages = [
            pkgs.nur.repos.mic92.goatcounter
            pkgs.nur.repos.ijohanne.zot
          ];
        })
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That flexibility is the point. You can browse globally and consume locally.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-concrete-case-study-my-own-nur-repo&quot;&gt;A concrete case study: my own NUR repo&lt;&#x2F;h2&gt;
&lt;p&gt;My repository at &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;nur-packages&quot;&gt;ijohanne&#x2F;nur-packages&lt;&#x2F;a&gt; is exactly why I like this model.&lt;&#x2F;p&gt;
&lt;p&gt;It contains a mix of things that would be awkward to pitch as one giant upstream contribution story:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Prometheus exporters for Hue, Netatmo, nftables, TeamSpeak3, Ecowitt, TP-Link P110, and PostgreSQL&lt;&#x2F;li&gt;
&lt;li&gt;NixOS modules for those exporters, not just package derivations&lt;&#x2F;li&gt;
&lt;li&gt;A hardened Firefox profile and curated Firefox addon packaging&lt;&#x2F;li&gt;
&lt;li&gt;Fish and Vim plugin sets&lt;&#x2F;li&gt;
&lt;li&gt;SDDM themes&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;zot&lt;&#x2F;code&gt;, &lt;code&gt;multicast-relay&lt;&#x2F;code&gt;, and other niche infrastructure tools&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;agent-skills-cli&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is a coherent repo because it is &lt;em&gt;my&lt;&#x2F;em&gt; repo. It reflects what I run, maintain, and care about. And that is precisely why the federated model works: each maintainer can publish a package universe that makes sense for them without asking a central project to absorb the whole identity of it.&lt;&#x2F;p&gt;
&lt;p&gt;NUR turns that into something discoverable and reusable for everybody else.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-trust-model-is-nicer&quot;&gt;The trust model is nicer&lt;&#x2F;h2&gt;
&lt;p&gt;In practice, NUR gives you a more granular trust decision than a central community repo.&lt;&#x2F;p&gt;
&lt;p&gt;You can decide:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;which repos you want to use&lt;&#x2F;li&gt;
&lt;li&gt;which commits you want to pin&lt;&#x2F;li&gt;
&lt;li&gt;whether you want the full overlay or a single upstream&lt;&#x2F;li&gt;
&lt;li&gt;whether you want only packages or also modules and overlays&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That maps directly to how Nix users already think. Reproducibility is not just about exact builds. It is also about being explicit about your inputs.&lt;&#x2F;p&gt;
&lt;p&gt;With NUR, the social layer and the technical layer line up more cleanly. You can say “I trust this maintainer’s repo for this host” in an actual, concrete, pin-able way.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-to-create-your-own-nur-repo&quot;&gt;How to create your own NUR repo&lt;&#x2F;h2&gt;
&lt;p&gt;The official &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;nix-community&#x2F;NUR&quot;&gt;NUR repository&lt;&#x2F;a&gt; expects each upstream to expose a top-level &lt;code&gt;default.nix&lt;&#x2F;code&gt;. In practice, many repos also expose a flake so they are easy to consume directly outside the shared overlay.&lt;&#x2F;p&gt;
&lt;p&gt;At the minimum, your package repo can look like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{ pkgs }:
{
  hello-nur = pkgs.callPackage .&amp;#x2F;pkgs&amp;#x2F;hello-nur { };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If you want a flake interface too, you can layer that on top:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  description = &amp;quot;My NUR packages&amp;quot;;

  inputs.nixpkgs.url = &amp;quot;github:NixOS&amp;#x2F;nixpkgs&amp;#x2F;nixos-unstable&amp;quot;;

  outputs = { self, nixpkgs }:
    let
      system = &amp;quot;x86_64-linux&amp;quot;;
      pkgs = import nixpkgs { inherit system; };
    in {
      legacyPackages.${system} = import .&amp;#x2F;default.nix { inherit pkgs; };
    };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Once the repo exists, you add it to &lt;code&gt;repos.json&lt;&#x2F;code&gt; in &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;nix-community&#x2F;NUR&quot;&gt;nix-community&#x2F;NUR&lt;&#x2F;a&gt;, open a PR, and your repo becomes part of the index. If you want updates to show up faster than the normal refresh cycle, NUR also exposes the &lt;code&gt;nur-update.nix-community.org&lt;&#x2F;code&gt; hook described in the upstream docs.&lt;&#x2F;p&gt;
&lt;p&gt;That onboarding path is refreshingly lightweight. You own your repo. NUR just helps people find it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-this-matters&quot;&gt;Why this matters&lt;&#x2F;h2&gt;
&lt;p&gt;The deeper story here is not “look, another package collection.”&lt;&#x2F;p&gt;
&lt;p&gt;The deeper story is that Nix packaging is already more composable than most ecosystems, and NUR preserves that composability instead of flattening it into one central contribution queue. It lets community packaging stay decentralized without becoming undiscoverable.&lt;&#x2F;p&gt;
&lt;p&gt;That is exactly the shape this ecosystem wants.&lt;&#x2F;p&gt;
&lt;p&gt;If you have only used &lt;code&gt;nixpkgs&lt;&#x2F;code&gt; plus the occasional flake input, spend half an hour browsing NUR. You will find packages you did not know existed, modules you did not realize someone had already written, and repos maintained by people solving the same weird little problems you are solving.&lt;&#x2F;p&gt;
&lt;p&gt;That is what a healthy package ecosystem looks like.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>My Codex App Workflow: Triage, Fix Threads, and Just Enough Research</title>
        <published>2026-04-13T00:14:00+00:00</published>
        <updated>2026-04-13T00:14:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/codex-app-thread-workflow/"/>
        <id>https://perlpimp.net/blog/codex-app-thread-workflow/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/codex-app-thread-workflow/">&lt;p&gt;There are two bad defaults when you start doing real work with a coding agent.&lt;&#x2F;p&gt;
&lt;p&gt;The first is one giant thread where everything happens forever. Ideas, debugging, random notes, half-formed plans, implementation, follow-up fixes. It feels convenient right up until the thread turns into soup and you cannot remember where a decision was made or which bit of context still matters.&lt;&#x2F;p&gt;
&lt;p&gt;The second is going full command-center mode with elaborate agent choreography for every piece of work. Parallel researchers, planners, implementers, reviewers, all passing notes to each other like you are running a tiny software consultancy inside your terminal. That can be impressive. It can also be a very efficient way to burn time and attention on process that does not actually help.&lt;&#x2F;p&gt;
&lt;p&gt;What I have settled into with the Codex app is a middle path: one mostly long-lived &lt;code&gt;Triage&lt;&#x2F;code&gt; thread, one focused &lt;code&gt;Fix issue-XXX&lt;&#x2F;code&gt; thread for each piece of implementation work, and occasional &lt;code&gt;Research XX&lt;&#x2F;code&gt; threads when I genuinely need to go learn something first.&lt;&#x2F;p&gt;
&lt;p&gt;It is not fancy. That is exactly why it works.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-problem-with-giant-threads&quot;&gt;The problem with giant threads&lt;&#x2F;h2&gt;
&lt;p&gt;Long threads decay. Not immediately, but predictably.&lt;&#x2F;p&gt;
&lt;p&gt;At the start, a single thread feels efficient because everything is “right there”. After a while it becomes a junk drawer. A bug report sits next to a design question, which sits next to a shell transcript, which sits next to a half-baked idea you had at midnight. The agent can still technically see all of it, but that does not mean the context is useful.&lt;&#x2F;p&gt;
&lt;p&gt;What I care about in day-to-day use is not maximum theoretical continuity. I care about keeping the current task legible. I want the thread title to tell me what I am doing. I want the history to be easy to skim later. And I want the working context to stay small enough that the app remains fast and pleasant.&lt;&#x2F;p&gt;
&lt;p&gt;That means I do not want one immortal everything-thread.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-triage-thread-as-the-inbox&quot;&gt;The &lt;code&gt;Triage&lt;&#x2F;code&gt; thread as the inbox&lt;&#x2F;h2&gt;
&lt;p&gt;The one long-lived exception is &lt;code&gt;Triage&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;That thread is my inbox. It is where I dump observations, rough ideas, “this seems off” notes, and small bits of investigation that are not yet implementation work. If I notice a bug while doing something else, it goes there. If I want to capture an idea for a post or a refactor without acting on it immediately, it goes there too.&lt;&#x2F;p&gt;
&lt;p&gt;This has effectively replaced the old &lt;code&gt;&#x2F;btw&lt;&#x2F;code&gt; habit I had in Claude Code. The point is the same: capture now, sort later. The difference is that in the Codex app I like having a dedicated thread that stays open and accumulates that intake work over time.&lt;&#x2F;p&gt;
&lt;p&gt;The important thing is that &lt;code&gt;Triage&lt;&#x2F;code&gt; is not where I finish work. It is where I decide what the work is.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;github-issues-sits-in-the-middle&quot;&gt;GitHub Issues sits in the middle&lt;&#x2F;h2&gt;
&lt;p&gt;From &lt;code&gt;Triage&lt;&#x2F;code&gt;, I file actual tasks into GitHub Issues, or any similar issue tracker an agent can handle cleanly.&lt;&#x2F;p&gt;
&lt;p&gt;That is the durable ledger. Threads are great for active conversation and local context. They are not where I want my task list to live. I want a real issue tracker that I can search, sort, sync, and return to later without mentally reconstructing a conversation.&lt;&#x2F;p&gt;
&lt;p&gt;So the flow is simple:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Something comes up in &lt;code&gt;Triage&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;If it is real work, I file a GitHub issue.&lt;&#x2F;li&gt;
&lt;li&gt;The issue becomes the handoff point from vague thought to concrete task.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That separation matters. It keeps the thread free to be conversational, while the issue tracker stays authoritative about what exists, what is in progress, and what still needs doing.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;one-fix-issue-xxx-thread-per-implementation-task&quot;&gt;One &lt;code&gt;Fix issue-XXX&lt;&#x2F;code&gt; thread per implementation task&lt;&#x2F;h2&gt;
&lt;p&gt;Once I actually start working an issue, I spin up a separate thread named something like &lt;code&gt;Fix issue-2od&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;This is probably the single habit that improved the experience the most.&lt;&#x2F;p&gt;
&lt;p&gt;Instead of dragging a whole triage history behind me, I get a clean workspace focused on one job. The thread name is searchable. The history is about one thing. If I need to come back in a week, I do not have to decode whether a command or explanation was related to this issue or some other thing that happened nearby.&lt;&#x2F;p&gt;
&lt;p&gt;This also makes the agent feel better to use. Keeping the active context narrow tends to keep responses snappier and the conversation more relevant. I do not need the model to remember every adjacent concern I had this month. I need it to help me finish the thing in front of me.&lt;&#x2F;p&gt;
&lt;p&gt;Thread naming is part of the system here, not an afterthought. If I can scan the sidebar and immediately see &lt;code&gt;Triage&lt;&#x2F;code&gt;, &lt;code&gt;Fix issue-2od&lt;&#x2F;code&gt;, &lt;code&gt;Fix issue-31a&lt;&#x2F;code&gt;, and &lt;code&gt;Research Zola feeds&lt;&#x2F;code&gt;, I have already reduced a bunch of cognitive load before opening anything.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;perlpimp.net&#x2F;blog&#x2F;codex-app-thread-workflow&#x2F;triage-threads-sidebar.png&quot; alt=&quot;Codex app sidebar with triage, fix, and research threads&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;research-xx-threads-are-for-uncertainty-not-theater&quot;&gt;&lt;code&gt;Research XX&lt;&#x2F;code&gt; threads are for uncertainty, not theater&lt;&#x2F;h2&gt;
&lt;p&gt;I do sometimes open separate research threads, but only when the task is genuinely exploratory.&lt;&#x2F;p&gt;
&lt;p&gt;If I need to understand a library, compare approaches, or verify how something works before I decide what to implement, that gets a &lt;code&gt;Research XX&lt;&#x2F;code&gt; thread. The key is that research is its own mode. It has a different shape than implementation, and I would rather keep that separate than dilute a fix thread with a bunch of speculative branching.&lt;&#x2F;p&gt;
&lt;p&gt;These research threads often end the same way: by producing one or more new GitHub issues.&lt;&#x2F;p&gt;
&lt;p&gt;That is useful because it means the exploration is not just “context”. It turns into concrete follow-up work with clear boundaries. Once that happens, I can close the research loop and go back to the per-issue thread pattern.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-this-works-better-than-over-orchestration&quot;&gt;Why this works better than over-orchestration&lt;&#x2F;h2&gt;
&lt;p&gt;This workflow is intentionally not an agent-swarm post.&lt;&#x2F;p&gt;
&lt;p&gt;I am not trying to simulate a company org chart inside the app. I do not want elaborate coordination machinery unless the problem genuinely demands it. Most day-to-day engineering work benefits more from clear boundaries than from more moving parts.&lt;&#x2F;p&gt;
&lt;p&gt;One inbox thread. One issue tracker. One focused implementation thread per issue. Research threads when they earn their keep.&lt;&#x2F;p&gt;
&lt;p&gt;That is enough structure to keep things tidy without making the workflow feel ceremonial.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;small-context-faster-app-better-history&quot;&gt;Small context, faster app, better history&lt;&#x2F;h2&gt;
&lt;p&gt;The biggest benefit is that everything stays small.&lt;&#x2F;p&gt;
&lt;p&gt;Small context means the active thread is easier to reason about. It usually means the app feels faster. It means when I search later, I find a thread whose title actually matches the work that happened inside it. And it means I do not dread opening old conversations because each one has a clear purpose.&lt;&#x2F;p&gt;
&lt;p&gt;That, more than anything, is what I want from a practical coding-agent workflow. Not maximal sophistication. Not a benchmark-optimized orchestration strategy. Just a system that stays readable, searchable, and pleasant after the novelty wears off.&lt;&#x2F;p&gt;
&lt;p&gt;For me, this is the sweet spot: enough structure to keep momentum, not so much structure that the workflow becomes its own hobby.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Monitoring nftables Firewall Rules with Prometheus and Grafana</title>
        <published>2026-04-12T00:00:00+00:00</published>
        <updated>2026-04-12T00:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/monitoring-nftables-prometheus-grafana/"/>
        <id>https://perlpimp.net/blog/monitoring-nftables-prometheus-grafana/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/monitoring-nftables-prometheus-grafana/">&lt;p&gt;nftables already counts packets and bytes for you. The frustrating part is that it keeps that data trapped inside &lt;code&gt;nft list ruleset&lt;&#x2F;code&gt;, tied to handle numbers that are useless unless you are staring at the exact ruleset on the exact machine at the exact moment something went wrong.&lt;&#x2F;p&gt;
&lt;p&gt;That is fine for one-off debugging. It is terrible for operations.&lt;&#x2F;p&gt;
&lt;p&gt;If you run your own router or firewall, you eventually want the same thing you want everywhere else: time-series data, alerting, dashboards, and enough context to tell the difference between “the WAN is noisy tonight” and “something is hammering the guest network and getting dropped exactly as intended.”&lt;&#x2F;p&gt;
&lt;p&gt;The trick that makes this workable is surprisingly small: put a &lt;code&gt;comment&lt;&#x2F;code&gt; on every nftables rule, then export that comment as a Prometheus label.&lt;&#x2F;p&gt;
&lt;p&gt;Once you do that, you stop looking at this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;nftables_rule_packets{table=&amp;quot;filter&amp;quot;,chain=&amp;quot;input&amp;quot;,handle=&amp;quot;15&amp;quot;} 48291
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;and start looking at this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;nftables_rule_packets{table=&amp;quot;filter&amp;quot;,chain=&amp;quot;input&amp;quot;,handle=&amp;quot;15&amp;quot;,comment=&amp;quot;wan drop&amp;quot;,action=&amp;quot;drop&amp;quot;} 48291
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That one extra label turns an opaque counter into something you can reason about at a glance.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-handle-numbers-are-the-wrong-abstraction&quot;&gt;Why handle numbers are the wrong abstraction&lt;&#x2F;h2&gt;
&lt;p&gt;nftables exposes counters per rule with the &lt;code&gt;counter&lt;&#x2F;code&gt; keyword. The data is there, but it is buried in the live ruleset:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;sudo nft list ruleset
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That gives you a snapshot, not a history. You cannot ask what your drop rate looked like over the last 24 hours. You cannot correlate a spike in blocked packets with a WireGuard handshake failure, a Prometheus alert, or some guest VLAN misbehavior. And if your dashboards key off handle numbers, a rebuild or ruleset reload can make them meaningless.&lt;&#x2F;p&gt;
&lt;p&gt;Comments solve the identity problem.&lt;&#x2F;p&gt;
&lt;p&gt;Handles are implementation details. Comments are intent.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;comments-as-labels&quot;&gt;Comments as labels&lt;&#x2F;h2&gt;
&lt;p&gt;The rule here is simple: every rule that matters gets both a &lt;code&gt;counter&lt;&#x2F;code&gt; and a &lt;code&gt;comment&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;I keep the comments short, lowercase, and boring on purpose. Two to four words is usually enough. The pattern is generally &lt;code&gt;&amp;lt;zone&#x2F;interface&amp;gt; &amp;lt;action&#x2F;purpose&amp;gt;&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;# Format: &amp;quot;&amp;lt;scope&amp;gt; &amp;lt;function&amp;gt;&amp;quot; or &amp;quot;&amp;lt;description&amp;gt;&amp;quot;
ct state invalid counter drop comment &amp;quot;invalid state&amp;quot;
ip saddr 10.0.0.0&amp;#x2F;8 tcp dport 53 counter accept comment &amp;quot;lan dns tcp&amp;quot;
ip saddr 10.0.0.0&amp;#x2F;8 udp dport 53 counter accept comment &amp;quot;lan dns udp&amp;quot;
ip saddr 10.0.0.0&amp;#x2F;8 ip protocol icmp counter accept comment &amp;quot;lan icmp&amp;quot;
iifname { &amp;quot;ppp0&amp;quot;, &amp;quot;mobile&amp;quot; } icmp type echo-request limit rate 5&amp;#x2F;second burst 10 packets counter accept comment &amp;quot;wan ping ratelimit&amp;quot;
iifname { &amp;quot;ppp0&amp;quot;, &amp;quot;mobile&amp;quot; } ip protocol icmp counter drop comment &amp;quot;wan icmp drop&amp;quot;
ip saddr 0.0.0.0&amp;#x2F;0 udp dport 51820 counter accept comment &amp;quot;wireguard&amp;quot;
iifname { &amp;quot;wifi&amp;quot;, &amp;quot;wired&amp;quot;, &amp;quot;mgnt&amp;quot;, &amp;quot;enp1s0f1&amp;quot;, &amp;quot;lo&amp;quot;, &amp;quot;wg0&amp;quot; } counter accept comment &amp;quot;trusted ifaces&amp;quot;
iifname &amp;quot;guest&amp;quot; udp dport { 53, 67, 68 } counter accept comment &amp;quot;guest dns+dhcp&amp;quot;
iifname &amp;quot;guest&amp;quot; counter drop comment &amp;quot;guest drop&amp;quot;
iifname &amp;quot;camera&amp;quot; counter drop comment &amp;quot;camera drop&amp;quot;
iifname &amp;quot;ppp0&amp;quot; ct state { established, related } counter accept comment &amp;quot;wan established&amp;quot;
iifname &amp;quot;ppp0&amp;quot; counter drop comment &amp;quot;wan drop&amp;quot;
log prefix &amp;quot;nft-input-drop: &amp;quot; counter drop comment &amp;quot;default drop&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The useful properties of this convention are:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Every rule you care about is observable.&lt;&#x2F;li&gt;
&lt;li&gt;The label is stable even when nftables handle numbers change.&lt;&#x2F;li&gt;
&lt;li&gt;The label is human-readable in Grafana legends and PromQL results.&lt;&#x2F;li&gt;
&lt;li&gt;The label lines up with your mental model of the firewall instead of nftables internals.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The last point matters more than it sounds. “wan drop” tells you what the rule is for. “handle 15” tells you where to start digging.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-complete-input-chain-with-comments&quot;&gt;A complete input chain with comments&lt;&#x2F;h2&gt;
&lt;p&gt;This is what the pattern looks like in practice on a router with trusted interfaces, guest isolation, a camera network, and WireGuard:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;chain input {
  type filter hook input priority filter; policy drop;
  ct state invalid counter drop comment &amp;quot;invalid state&amp;quot;
  ip saddr 10.0.0.0&amp;#x2F;8 tcp dport 53 counter accept comment &amp;quot;lan dns tcp&amp;quot;
  ip saddr 10.0.0.0&amp;#x2F;8 udp dport 53 counter accept comment &amp;quot;lan dns udp&amp;quot;
  ip saddr 10.0.0.0&amp;#x2F;8 ip protocol icmp counter accept comment &amp;quot;lan icmp&amp;quot;
  iifname { &amp;quot;ppp0&amp;quot;, &amp;quot;mobile&amp;quot; } icmp type { destination-unreachable, time-exceeded, parameter-problem, echo-reply } counter accept comment &amp;quot;wan icmp replies&amp;quot;
  iifname { &amp;quot;ppp0&amp;quot;, &amp;quot;mobile&amp;quot; } icmp type echo-request limit rate 5&amp;#x2F;second burst 10 packets counter accept comment &amp;quot;wan ping ratelimit&amp;quot;
  iifname { &amp;quot;ppp0&amp;quot;, &amp;quot;mobile&amp;quot; } ip protocol icmp counter drop comment &amp;quot;wan icmp drop&amp;quot;
  ip saddr 0.0.0.0&amp;#x2F;0 udp dport 51820 counter accept comment &amp;quot;wireguard&amp;quot;
  iifname { &amp;quot;wifi&amp;quot;, &amp;quot;wired&amp;quot;, &amp;quot;mgnt&amp;quot;, &amp;quot;enp1s0f1&amp;quot;, &amp;quot;lo&amp;quot;, &amp;quot;wg0&amp;quot; } counter accept comment &amp;quot;trusted ifaces&amp;quot;
  iifname &amp;quot;guest&amp;quot; udp dport { 53, 67, 68 } counter accept comment &amp;quot;guest dns+dhcp&amp;quot;
  iifname &amp;quot;guest&amp;quot; tcp dport 53 counter accept comment &amp;quot;guest dns tcp&amp;quot;
  iifname &amp;quot;guest&amp;quot; ct state { established, related } counter accept comment &amp;quot;guest established&amp;quot;
  iifname &amp;quot;guest&amp;quot; counter drop comment &amp;quot;guest drop&amp;quot;
  iifname &amp;quot;camera&amp;quot; udp dport { 53, 67, 68 } counter accept comment &amp;quot;camera dns+dhcp&amp;quot;
  iifname &amp;quot;camera&amp;quot; tcp dport 53 counter accept comment &amp;quot;camera dns tcp&amp;quot;
  iifname &amp;quot;camera&amp;quot; ct state { established, related } counter accept comment &amp;quot;camera established&amp;quot;
  iifname &amp;quot;camera&amp;quot; counter drop comment &amp;quot;camera drop&amp;quot;
  ip protocol igmp counter accept comment &amp;quot;igmp&amp;quot;
  ip saddr 224.0.0.0&amp;#x2F;4 counter accept comment &amp;quot;multicast&amp;quot;
  iifname &amp;quot;ppp0&amp;quot; ct state { established, related } counter accept comment &amp;quot;wan established&amp;quot;
  iifname &amp;quot;ppp0&amp;quot; counter drop comment &amp;quot;wan drop&amp;quot;
  iifname &amp;quot;mobile&amp;quot; ct state { established, related } counter accept comment &amp;quot;mobile established&amp;quot;
  iifname &amp;quot;mobile&amp;quot; counter drop comment &amp;quot;mobile drop&amp;quot;
  log prefix &amp;quot;nft-input-drop: &amp;quot; counter drop comment &amp;quot;default drop&amp;quot;
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;There are two practical habits doing most of the work here:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Every terminal rule has a &lt;code&gt;counter&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Every terminal rule has a comment that explains intent, not syntax.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That is what makes the exporter and the dashboard useful later.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;forward-and-nat-chains&quot;&gt;Forward and NAT chains&lt;&#x2F;h2&gt;
&lt;p&gt;The same comment pattern carries over cleanly into forwarding and NAT:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;chain forward {
  meta oiftype ppp tcp flags syn tcp option maxseg size set 1452 counter comment &amp;quot;ppp mss clamp&amp;quot;
  type filter hook forward priority filter; policy drop;
  ip protocol { tcp, udp } ct state established ct status ! dnat flow add @fastnat counter comment &amp;quot;flow offload non-dnat&amp;quot;
  ct state invalid counter drop comment &amp;quot;invalid state&amp;quot;
  iifname { &amp;quot;guest&amp;quot;, &amp;quot;wifi&amp;quot;, &amp;quot;wired&amp;quot;, &amp;quot;mgnt&amp;quot;, &amp;quot;enp1s0f1&amp;quot;, &amp;quot;wg0&amp;quot; } oifname { &amp;quot;ppp0&amp;quot;, &amp;quot;enp1s0f1&amp;quot;, &amp;quot;mobile&amp;quot; } counter accept comment &amp;quot;lan to internet&amp;quot;
  iifname &amp;quot;camera&amp;quot; ip saddr 10.255.200.3 oifname { &amp;quot;ppp0&amp;quot;, &amp;quot;enp1s0f1&amp;quot;, &amp;quot;mobile&amp;quot; } counter accept comment &amp;quot;unvr to internet&amp;quot;
  iifname { &amp;quot;ppp0&amp;quot;, &amp;quot;enp1s0f1&amp;quot;, &amp;quot;mobile&amp;quot; } oifname { &amp;quot;wifi&amp;quot;, &amp;quot;wired&amp;quot;, &amp;quot;mgnt&amp;quot;, &amp;quot;guest&amp;quot;, &amp;quot;camera&amp;quot;, &amp;quot;wg0&amp;quot; } ct state established,related counter accept comment &amp;quot;wan return&amp;quot;
  iifname { &amp;quot;wifi&amp;quot;, &amp;quot;wired&amp;quot;, &amp;quot;mgnt&amp;quot;, &amp;quot;enp1s0f1&amp;quot;, &amp;quot;wg0&amp;quot; } oifname { &amp;quot;wifi&amp;quot;, &amp;quot;wired&amp;quot;, &amp;quot;mgnt&amp;quot;, &amp;quot;guest&amp;quot;, &amp;quot;camera&amp;quot;, &amp;quot;wg0&amp;quot; } counter accept comment &amp;quot;inter-vlan&amp;quot;
  iifname { &amp;quot;wifi&amp;quot;, &amp;quot;wired&amp;quot;, &amp;quot;mgnt&amp;quot;, &amp;quot;wg0&amp;quot; } oifname &amp;quot;camera&amp;quot; counter accept comment &amp;quot;lan to camera&amp;quot;
  iifname &amp;quot;camera&amp;quot; oifname { &amp;quot;wifi&amp;quot;, &amp;quot;wired&amp;quot;, &amp;quot;mgnt&amp;quot;, &amp;quot;wg0&amp;quot; } ct state established,related counter accept comment &amp;quot;camera return&amp;quot;
  iifname { &amp;quot;guest&amp;quot;, &amp;quot;camera&amp;quot; } oifname &amp;quot;wired&amp;quot; ip daddr 10.255.101.202 udp dport 123 counter accept comment &amp;quot;guest+camera ntp&amp;quot;
  log prefix &amp;quot;nft-forward-drop: &amp;quot; counter drop comment &amp;quot;default drop&amp;quot;
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;table ip nat {
  chain postrouting {
    type nat hook postrouting priority filter; policy accept;
    oifname &amp;quot;ppp0&amp;quot; counter masquerade comment &amp;quot;ppp0 masquerade&amp;quot;
    oifname &amp;quot;enp1s0f1&amp;quot; counter masquerade comment &amp;quot;vlan masquerade&amp;quot;
    oifname &amp;quot;mobile&amp;quot; counter masquerade comment &amp;quot;mobile masquerade&amp;quot;
    iifname &amp;quot;wired&amp;quot; oifname &amp;quot;wired&amp;quot; ct status dnat counter masquerade comment &amp;quot;hairpin nat&amp;quot;
  }
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Once you commit to this pattern, the ruleset reads better even before Prometheus gets involved.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;exporting-nftables-to-prometheus&quot;&gt;Exporting nftables to Prometheus&lt;&#x2F;h2&gt;
&lt;p&gt;The exporter on this router is a custom NixOS module, &lt;code&gt;prometheus-nftables-exporter&lt;&#x2F;code&gt;, bundled in my &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;nur-packages&quot;&gt;&lt;code&gt;ijohanne&#x2F;nur-packages&lt;&#x2F;code&gt;&lt;&#x2F;a&gt; repository. It runs locally, executes &lt;code&gt;nft -j list ruleset&lt;&#x2F;code&gt;, parses the JSON, and emits Prometheus metrics for rule counters plus a small amount of table and chain structure data.&lt;&#x2F;p&gt;
&lt;p&gt;If you want to pull it into a NixOS configuration, the short version is to add the repo as a flake input and import the module:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs = {
    nixpkgs.url = &amp;quot;github:NixOS&amp;#x2F;nixpkgs&amp;#x2F;nixos-unstable&amp;quot;;
    ijohanne-nur.url = &amp;quot;github:ijohanne&amp;#x2F;nur-packages&amp;quot;;
  };

  outputs = { nixpkgs, ijohanne-nur, ... }: {
    nixosConfigurations.router = nixpkgs.lib.nixosSystem {
      modules = [
        ijohanne-nur.nixosModules.prometheus-nftables-exporter
        .&amp;#x2F;configuration.nix
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Once the module is imported, the service config stays small:&lt;&#x2F;p&gt;
&lt;p&gt;The NixOS configuration is intentionally small:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{ ... }:

{
  services.prometheus-nftables-exporter = {
    enable = true;
    enableLocalScraping = true;
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;enableLocalScraping = true&lt;&#x2F;code&gt; is the nice part. The router’s local Prometheus instance picks the exporter up automatically, so there is no extra scrape stanza to maintain somewhere else.&lt;&#x2F;p&gt;
&lt;p&gt;The exporter loop is straightforward:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Run &lt;code&gt;nft -j list ruleset&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Walk the tables, chains, and rules.&lt;&#x2F;li&gt;
&lt;li&gt;Extract packet counters, byte counters, handle, action, table, chain, family, and comment.&lt;&#x2F;li&gt;
&lt;li&gt;Expose those fields as Prometheus metrics.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h2 id=&quot;metrics-and-labels&quot;&gt;Metrics and labels&lt;&#x2F;h2&gt;
&lt;p&gt;The rule-level metrics look like this:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;&#x2F;th&gt;&lt;th&gt;Type&lt;&#x2F;th&gt;&lt;th&gt;Labels&lt;&#x2F;th&gt;&lt;th&gt;Description&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;nftables_rule_packets&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;counter&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;table&lt;&#x2F;code&gt;, &lt;code&gt;chain&lt;&#x2F;code&gt;, &lt;code&gt;family&lt;&#x2F;code&gt;, &lt;code&gt;handle&lt;&#x2F;code&gt;, &lt;code&gt;action&lt;&#x2F;code&gt;, &lt;code&gt;comment&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Packets matched by the rule&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;nftables_rule_bytes&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;counter&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;table&lt;&#x2F;code&gt;, &lt;code&gt;chain&lt;&#x2F;code&gt;, &lt;code&gt;family&lt;&#x2F;code&gt;, &lt;code&gt;handle&lt;&#x2F;code&gt;, &lt;code&gt;action&lt;&#x2F;code&gt;, &lt;code&gt;comment&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Bytes matched by the rule&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;And the structural metrics look like this:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;&#x2F;th&gt;&lt;th&gt;Type&lt;&#x2F;th&gt;&lt;th&gt;Labels&lt;&#x2F;th&gt;&lt;th&gt;Description&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;nftables_table_chains&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;gauge&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;table&lt;&#x2F;code&gt;, &lt;code&gt;family&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Number of chains in each table&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;nftables_chain_rules&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;gauge&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;table&lt;&#x2F;code&gt;, &lt;code&gt;chain&lt;&#x2F;code&gt;, &lt;code&gt;family&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Number of rules in each chain&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;nftables_up&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;gauge&lt;&#x2F;td&gt;&lt;td&gt;none&lt;&#x2F;td&gt;&lt;td&gt;Whether the exporter is healthy&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;An example sample line:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;nftables_rule_packets{table=&amp;quot;filter&amp;quot;,chain=&amp;quot;input&amp;quot;,family=&amp;quot;ip&amp;quot;,handle=&amp;quot;15&amp;quot;,action=&amp;quot;drop&amp;quot;,comment=&amp;quot;wan drop&amp;quot;} 48291
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is exactly why the comment label matters. Even if handle &lt;code&gt;15&lt;&#x2F;code&gt; becomes &lt;code&gt;24&lt;&#x2F;code&gt; after a rebuild, the thing you actually care about is still “wan drop”.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-first-promql-queries-that-matter&quot;&gt;The first PromQL queries that matter&lt;&#x2F;h2&gt;
&lt;p&gt;Once the metrics exist, the first few queries are obvious and immediately useful:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;# Rate of dropped packets per rule
rate(nftables_rule_packets{instance=&amp;quot;$instance&amp;quot;,action=&amp;quot;drop&amp;quot;}[$__rate_interval])

# Input chain rules, only active series
rate(nftables_rule_packets{instance=&amp;quot;$instance&amp;quot;,chain=&amp;quot;input&amp;quot;,table=&amp;quot;filter&amp;quot;}[$__rate_interval]) &amp;gt; 0

# Top 10 busiest rules by byte rate
topk(10, rate(nftables_rule_bytes{instance=&amp;quot;$instance&amp;quot;}[$__rate_interval]))

# Total traffic across all counted rules
sum(rate(nftables_rule_bytes{instance=&amp;quot;$instance&amp;quot;}[$__rate_interval]))
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The key design choice is to query and display by &lt;code&gt;comment&lt;&#x2F;code&gt;, not by handle. Handles can still be useful for debugging against a live ruleset, but the dashboard identity should be based on names that survive rebuilds.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;building-a-grafana-dashboard-that-answers-real-questions&quot;&gt;Building a Grafana dashboard that answers real questions&lt;&#x2F;h2&gt;
&lt;p&gt;The most important firewall panels are not the prettiest ones. They are the ones that answer:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;What is getting dropped right now?&lt;&#x2F;li&gt;
&lt;li&gt;Which chain is busy?&lt;&#x2F;li&gt;
&lt;li&gt;Is the guest or camera network doing something unusual?&lt;&#x2F;li&gt;
&lt;li&gt;Did a ruleset change remove an expected counter?&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is why the first row I care about is drops and rejects, not total traffic.&lt;&#x2F;p&gt;
&lt;p&gt;The dashboard layout I ended up with looks like this:&lt;&#x2F;p&gt;
&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;Exporter status&lt;&#x2F;li&gt;
&lt;li&gt;Table count&lt;&#x2F;li&gt;
&lt;li&gt;Chain count&lt;&#x2F;li&gt;
&lt;li&gt;Rule count&lt;&#x2F;li&gt;
&lt;li&gt;Aggregate rule traffic&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;drops-and-rejects&quot;&gt;Drops and rejects&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;Dropped packets by rule&lt;&#x2F;li&gt;
&lt;li&gt;Dropped bytes by rule&lt;&#x2F;li&gt;
&lt;li&gt;An all-drops view broken out by chain&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;This is the panel group I watch first during scans, misconfigurations, or sudden noise on the WAN.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;chain-specific-rows&quot;&gt;Chain-specific rows&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;Input packets and bytes by rule&lt;&#x2F;li&gt;
&lt;li&gt;Forward packets and bytes by rule&lt;&#x2F;li&gt;
&lt;li&gt;NAT prerouting and postrouting packets by rule&lt;&#x2F;li&gt;
&lt;li&gt;IPv6 input and forward packets by rule&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;top-rules-and-structure&quot;&gt;Top rules and structure&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;Top 10 rules by bytes&lt;&#x2F;li&gt;
&lt;li&gt;Top 10 rules by packets&lt;&#x2F;li&gt;
&lt;li&gt;Table and chain structure tables&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The template variables are minimal:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;$datasource&lt;&#x2F;code&gt; for the Prometheus source&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;$instance&lt;&#x2F;code&gt; from &lt;code&gt;label_values(nftables_up, instance)&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;complete-dashboard-json&quot;&gt;Complete dashboard JSON&lt;&#x2F;h2&gt;
&lt;p&gt;Import this via Grafana’s dashboard import UI, or provision it as a file:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;json&quot; class=&quot;language-json &quot;&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;{
  &amp;quot;annotations&amp;quot;: {
    &amp;quot;list&amp;quot;: []
  },
  &amp;quot;editable&amp;quot;: true,
  &amp;quot;fiscalYearStartMonth&amp;quot;: 0,
  &amp;quot;graphTooltip&amp;quot;: 1,
  &amp;quot;links&amp;quot;: [],
  &amp;quot;panels&amp;quot;: [
    {
      &amp;quot;collapsed&amp;quot;: false,
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 1, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 0 },
      &amp;quot;id&amp;quot;: 100,
      &amp;quot;title&amp;quot;: &amp;quot;Overview&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;row&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;mappings&amp;quot;: [{ &amp;quot;options&amp;quot;: { &amp;quot;0&amp;quot;: { &amp;quot;text&amp;quot;: &amp;quot;Down&amp;quot;, &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot; }, &amp;quot;1&amp;quot;: { &amp;quot;text&amp;quot;: &amp;quot;Up&amp;quot;, &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot; } }, &amp;quot;type&amp;quot;: &amp;quot;value&amp;quot; }],
          &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: null }, { &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: 1 }] }
        }
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 4, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 1,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;background&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;] }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;Exporter Status&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;nftables_up{instance=\&amp;quot;$instance\&amp;quot;}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;blue&amp;quot;, &amp;quot;value&amp;quot;: null }] } } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 5, &amp;quot;x&amp;quot;: 4, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 2,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;background&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;] }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;Tables&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;count(nftables_table_chains{instance=\&amp;quot;$instance\&amp;quot;})&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;blue&amp;quot;, &amp;quot;value&amp;quot;: null }] } } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 5, &amp;quot;x&amp;quot;: 9, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 3,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;background&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;] }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;Chains&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;sum(nftables_table_chains{instance=\&amp;quot;$instance\&amp;quot;})&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;blue&amp;quot;, &amp;quot;value&amp;quot;: null }] } } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 5, &amp;quot;x&amp;quot;: 14, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 4,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;background&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;] }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;Rules&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;sum(nftables_chain_rules{instance=\&amp;quot;$instance\&amp;quot;})&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;unit&amp;quot;: &amp;quot;Bps&amp;quot;, &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: null }] } } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 5, &amp;quot;x&amp;quot;: 19, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 5,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;background&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;area&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;] }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;Total Rule Traffic&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;sum(rate(nftables_rule_bytes{instance=\&amp;quot;$instance\&amp;quot;}[$__rate_interval]))&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;collapsed&amp;quot;: false, &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 1, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 5 }, &amp;quot;id&amp;quot;: 101, &amp;quot;title&amp;quot;: &amp;quot;Drops &amp;amp; Rejects&amp;quot;, &amp;quot;type&amp;quot;: &amp;quot;row&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false }, &amp;quot;unit&amp;quot;: &amp;quot;pps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 6 },
      &amp;quot;id&amp;quot;: 10,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;max&amp;quot;, &amp;quot;lastNotNull&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Dropped Packets (rate)&amp;quot;,
      &amp;quot;description&amp;quot;: &amp;quot;All rules with drop action — key indicator of blocked traffic&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;rate(nftables_rule_packets{instance=\&amp;quot;$instance\&amp;quot;,action=\&amp;quot;drop\&amp;quot;}[$__rate_interval])&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{chain}} — {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false }, &amp;quot;unit&amp;quot;: &amp;quot;Bps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 6 },
      &amp;quot;id&amp;quot;: 11,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;max&amp;quot;, &amp;quot;lastNotNull&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Dropped Bytes (rate)&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;rate(nftables_rule_bytes{instance=\&amp;quot;$instance\&amp;quot;,action=\&amp;quot;drop\&amp;quot;}[$__rate_interval])&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{chain}} — {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false }, &amp;quot;unit&amp;quot;: &amp;quot;pps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 14 },
      &amp;quot;id&amp;quot;: 12,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;max&amp;quot;, &amp;quot;lastNotNull&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;All Drops by Chain&amp;quot;,
      &amp;quot;description&amp;quot;: &amp;quot;All rules with drop action — catch-all and explicit drops&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;rate(nftables_rule_packets{instance=\&amp;quot;$instance\&amp;quot;,action=\&amp;quot;drop\&amp;quot;}[$__rate_interval])&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{family}}&amp;#x2F;{{chain}} handle={{handle}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;collapsed&amp;quot;: false, &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 1, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 22 }, &amp;quot;id&amp;quot;: 102, &amp;quot;title&amp;quot;: &amp;quot;Input Chain&amp;quot;, &amp;quot;type&amp;quot;: &amp;quot;row&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false, &amp;quot;stacking&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;normal&amp;quot; } }, &amp;quot;unit&amp;quot;: &amp;quot;pps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 23 },
      &amp;quot;id&amp;quot;: 20,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Input — Packets by Rule&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;rate(nftables_rule_packets{instance=\&amp;quot;$instance\&amp;quot;,chain=\&amp;quot;input\&amp;quot;,table=\&amp;quot;filter\&amp;quot;}[$__rate_interval]) &amp;gt; 0&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{action}} {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false, &amp;quot;stacking&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;normal&amp;quot; } }, &amp;quot;unit&amp;quot;: &amp;quot;Bps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 23 },
      &amp;quot;id&amp;quot;: 21,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Input — Bytes by Rule&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;rate(nftables_rule_bytes{instance=\&amp;quot;$instance\&amp;quot;,chain=\&amp;quot;input\&amp;quot;,table=\&amp;quot;filter\&amp;quot;}[$__rate_interval]) &amp;gt; 0&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{action}} {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;collapsed&amp;quot;: false, &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 1, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 31 }, &amp;quot;id&amp;quot;: 103, &amp;quot;title&amp;quot;: &amp;quot;Forward Chain&amp;quot;, &amp;quot;type&amp;quot;: &amp;quot;row&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false, &amp;quot;stacking&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;normal&amp;quot; } }, &amp;quot;unit&amp;quot;: &amp;quot;pps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 32 },
      &amp;quot;id&amp;quot;: 30,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Forward — Packets by Rule&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;rate(nftables_rule_packets{instance=\&amp;quot;$instance\&amp;quot;,chain=\&amp;quot;forward\&amp;quot;,table=\&amp;quot;filter\&amp;quot;}[$__rate_interval]) &amp;gt; 0&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{action}} {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false, &amp;quot;stacking&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;normal&amp;quot; } }, &amp;quot;unit&amp;quot;: &amp;quot;Bps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 32 },
      &amp;quot;id&amp;quot;: 31,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Forward — Bytes by Rule&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;rate(nftables_rule_bytes{instance=\&amp;quot;$instance\&amp;quot;,chain=\&amp;quot;forward\&amp;quot;,table=\&amp;quot;filter\&amp;quot;}[$__rate_interval]) &amp;gt; 0&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{action}} {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;collapsed&amp;quot;: false, &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 1, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 40 }, &amp;quot;id&amp;quot;: 104, &amp;quot;title&amp;quot;: &amp;quot;NAT&amp;quot;, &amp;quot;type&amp;quot;: &amp;quot;row&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false, &amp;quot;stacking&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;normal&amp;quot; } }, &amp;quot;unit&amp;quot;: &amp;quot;pps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 41 },
      &amp;quot;id&amp;quot;: 40,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;NAT Prerouting — Packets by Rule&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;rate(nftables_rule_packets{instance=\&amp;quot;$instance\&amp;quot;,chain=\&amp;quot;prerouting\&amp;quot;,table=\&amp;quot;nat\&amp;quot;}[$__rate_interval]) &amp;gt; 0&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{action}} {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false, &amp;quot;stacking&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;normal&amp;quot; } }, &amp;quot;unit&amp;quot;: &amp;quot;pps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 41 },
      &amp;quot;id&amp;quot;: 41,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;NAT Postrouting — Packets by Rule&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;rate(nftables_rule_packets{instance=\&amp;quot;$instance\&amp;quot;,chain=\&amp;quot;postrouting\&amp;quot;,table=\&amp;quot;nat\&amp;quot;}[$__rate_interval]) &amp;gt; 0&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{action}} {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;collapsed&amp;quot;: false, &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 1, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 49 }, &amp;quot;id&amp;quot;: 105, &amp;quot;title&amp;quot;: &amp;quot;IPv6 Filter&amp;quot;, &amp;quot;type&amp;quot;: &amp;quot;row&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false, &amp;quot;stacking&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;normal&amp;quot; } }, &amp;quot;unit&amp;quot;: &amp;quot;pps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 50 },
      &amp;quot;id&amp;quot;: 50,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;IPv6 Input — Packets by Rule&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;rate(nftables_rule_packets{instance=\&amp;quot;$instance\&amp;quot;,chain=\&amp;quot;input\&amp;quot;,family=\&amp;quot;ip6\&amp;quot;}[$__rate_interval]) &amp;gt; 0&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{table}}&amp;#x2F;{{chain}} {{action}} {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false, &amp;quot;stacking&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;normal&amp;quot; } }, &amp;quot;unit&amp;quot;: &amp;quot;pps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 50 },
      &amp;quot;id&amp;quot;: 51,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;IPv6 Forward — Packets by Rule&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;rate(nftables_rule_packets{instance=\&amp;quot;$instance\&amp;quot;,chain=\&amp;quot;forward\&amp;quot;,family=\&amp;quot;ip6\&amp;quot;}[$__rate_interval]) &amp;gt; 0&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{table}}&amp;#x2F;{{chain}} {{action}} {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;collapsed&amp;quot;: false, &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 1, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 58 }, &amp;quot;id&amp;quot;: 106, &amp;quot;title&amp;quot;: &amp;quot;Top Rules&amp;quot;, &amp;quot;type&amp;quot;: &amp;quot;row&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false }, &amp;quot;unit&amp;quot;: &amp;quot;Bps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 59 },
      &amp;quot;id&amp;quot;: 60,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;max&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Top 10 Rules by Bytes&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;topk(10, rate(nftables_rule_bytes{instance=\&amp;quot;$instance\&amp;quot;}[$__rate_interval]))&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{table}}&amp;#x2F;{{chain}} {{action}} {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: { &amp;quot;defaults&amp;quot;: { &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; }, &amp;quot;custom&amp;quot;: { &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;spanNulls&amp;quot;: false }, &amp;quot;unit&amp;quot;: &amp;quot;pps&amp;quot; } },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 59 },
      &amp;quot;id&amp;quot;: 61,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;right&amp;quot;, &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;max&amp;quot;] }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Top 10 Rules by Packets&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;topk(10, rate(nftables_rule_packets{instance=\&amp;quot;$instance\&amp;quot;}[$__rate_interval]))&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{table}}&amp;#x2F;{{chain}} {{action}} {{comment}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }]
    },
    {
      &amp;quot;collapsed&amp;quot;: false, &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 1, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 67 }, &amp;quot;id&amp;quot;: 107, &amp;quot;title&amp;quot;: &amp;quot;Structure&amp;quot;, &amp;quot;type&amp;quot;: &amp;quot;row&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: { &amp;quot;custom&amp;quot;: { &amp;quot;align&amp;quot;: &amp;quot;auto&amp;quot; }, &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;thresholds&amp;quot; }, &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;blue&amp;quot;, &amp;quot;value&amp;quot;: null }] } },
        &amp;quot;overrides&amp;quot;: [{ &amp;quot;matcher&amp;quot;: { &amp;quot;id&amp;quot;: &amp;quot;byName&amp;quot;, &amp;quot;options&amp;quot;: &amp;quot;Chains&amp;quot; }, &amp;quot;properties&amp;quot;: [{ &amp;quot;id&amp;quot;: &amp;quot;custom.cellOptions&amp;quot;, &amp;quot;value&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;color-background&amp;quot; } }] }, { &amp;quot;matcher&amp;quot;: { &amp;quot;id&amp;quot;: &amp;quot;byName&amp;quot;, &amp;quot;options&amp;quot;: &amp;quot;Rules&amp;quot; }, &amp;quot;properties&amp;quot;: [{ &amp;quot;id&amp;quot;: &amp;quot;custom.cellOptions&amp;quot;, &amp;quot;value&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;color-background&amp;quot; } }] }]
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 6, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 68 },
      &amp;quot;id&amp;quot;: 70,
      &amp;quot;options&amp;quot;: { &amp;quot;showHeader&amp;quot;: true },
      &amp;quot;title&amp;quot;: &amp;quot;Tables &amp;amp; Chains&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;table&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;nftables_table_chains{instance=\&amp;quot;$instance\&amp;quot;}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;, &amp;quot;instant&amp;quot;: true, &amp;quot;format&amp;quot;: &amp;quot;table&amp;quot; }],
      &amp;quot;transformations&amp;quot;: [{ &amp;quot;id&amp;quot;: &amp;quot;organize&amp;quot;, &amp;quot;options&amp;quot;: { &amp;quot;excludeByName&amp;quot;: { &amp;quot;Time&amp;quot;: true, &amp;quot;__name__&amp;quot;: true, &amp;quot;instance&amp;quot;: true, &amp;quot;job&amp;quot;: true }, &amp;quot;renameByName&amp;quot;: { &amp;quot;table&amp;quot;: &amp;quot;Table&amp;quot;, &amp;quot;family&amp;quot;: &amp;quot;Family&amp;quot;, &amp;quot;Value&amp;quot;: &amp;quot;Chains&amp;quot; } } }]
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: { &amp;quot;custom&amp;quot;: { &amp;quot;align&amp;quot;: &amp;quot;auto&amp;quot; }, &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;thresholds&amp;quot; }, &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;blue&amp;quot;, &amp;quot;value&amp;quot;: null }] } },
        &amp;quot;overrides&amp;quot;: [{ &amp;quot;matcher&amp;quot;: { &amp;quot;id&amp;quot;: &amp;quot;byName&amp;quot;, &amp;quot;options&amp;quot;: &amp;quot;Rules&amp;quot; }, &amp;quot;properties&amp;quot;: [{ &amp;quot;id&amp;quot;: &amp;quot;custom.cellOptions&amp;quot;, &amp;quot;value&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;color-background&amp;quot; } }] }]
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 6, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 68 },
      &amp;quot;id&amp;quot;: 71,
      &amp;quot;options&amp;quot;: { &amp;quot;showHeader&amp;quot;: true },
      &amp;quot;title&amp;quot;: &amp;quot;Chains &amp;amp; Rules&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;table&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;nftables_chain_rules{instance=\&amp;quot;$instance\&amp;quot;}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;, &amp;quot;instant&amp;quot;: true, &amp;quot;format&amp;quot;: &amp;quot;table&amp;quot; }],
      &amp;quot;transformations&amp;quot;: [{ &amp;quot;id&amp;quot;: &amp;quot;organize&amp;quot;, &amp;quot;options&amp;quot;: { &amp;quot;excludeByName&amp;quot;: { &amp;quot;Time&amp;quot;: true, &amp;quot;__name__&amp;quot;: true, &amp;quot;instance&amp;quot;: true, &amp;quot;job&amp;quot;: true }, &amp;quot;renameByName&amp;quot;: { &amp;quot;table&amp;quot;: &amp;quot;Table&amp;quot;, &amp;quot;family&amp;quot;: &amp;quot;Family&amp;quot;, &amp;quot;chain&amp;quot;: &amp;quot;Chain&amp;quot;, &amp;quot;Value&amp;quot;: &amp;quot;Rules&amp;quot; } } }]
    }
  ],
  &amp;quot;schemaVersion&amp;quot;: 39,
  &amp;quot;tags&amp;quot;: [&amp;quot;nftables&amp;quot;, &amp;quot;firewall&amp;quot;],
  &amp;quot;templating&amp;quot;: {
    &amp;quot;list&amp;quot;: [
      { &amp;quot;current&amp;quot;: {}, &amp;quot;includeAll&amp;quot;: false, &amp;quot;multi&amp;quot;: false, &amp;quot;name&amp;quot;: &amp;quot;datasource&amp;quot;, &amp;quot;query&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;refresh&amp;quot;: 1, &amp;quot;type&amp;quot;: &amp;quot;datasource&amp;quot; },
      { &amp;quot;current&amp;quot;: {}, &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; }, &amp;quot;definition&amp;quot;: &amp;quot;label_values(nftables_up, instance)&amp;quot;, &amp;quot;includeAll&amp;quot;: false, &amp;quot;multi&amp;quot;: false, &amp;quot;name&amp;quot;: &amp;quot;instance&amp;quot;, &amp;quot;query&amp;quot;: { &amp;quot;query&amp;quot;: &amp;quot;label_values(nftables_up, instance)&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;StandardVariableQuery&amp;quot; }, &amp;quot;refresh&amp;quot;: 2, &amp;quot;sort&amp;quot;: 1, &amp;quot;type&amp;quot;: &amp;quot;query&amp;quot; }
    ]
  },
  &amp;quot;time&amp;quot;: { &amp;quot;from&amp;quot;: &amp;quot;now-6h&amp;quot;, &amp;quot;to&amp;quot;: &amp;quot;now&amp;quot; },
  &amp;quot;timepicker&amp;quot;: {},
  &amp;quot;timezone&amp;quot;: &amp;quot;&amp;quot;,
  &amp;quot;title&amp;quot;: &amp;quot;nftables Firewall&amp;quot;,
  &amp;quot;uid&amp;quot;: &amp;quot;nftables-firewall&amp;quot;,
  &amp;quot;refresh&amp;quot;: &amp;quot;30s&amp;quot;
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;quick-health-checks&quot;&gt;Quick health checks&lt;&#x2F;h2&gt;
&lt;p&gt;These are the first commands I use to verify that the whole stack is telling the truth:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Verify the ruleset actually contains counters
sudo nft list ruleset | grep -c &amp;quot;counter&amp;quot;

# Check that every counted rule has a comment
sudo nft list ruleset | grep &amp;quot;counter&amp;quot; | grep -v &amp;quot;comment&amp;quot;
# This should print nothing.

# Spot-check live rules with comments
sudo nft list ruleset | grep &amp;quot;comment&amp;quot; | head -20

# Confirm the exporter is running
systemctl status prometheus-nftables-exporter

# Check the metrics endpoint
curl -s http:&amp;#x2F;&amp;#x2F;localhost:9630&amp;#x2F;metrics | grep nftables_up

# Find the busiest drop rules
curl -s http:&amp;#x2F;&amp;#x2F;localhost:9630&amp;#x2F;metrics | grep &amp;#x27;action=&amp;quot;drop&amp;quot;&amp;#x27; | sort -t&amp;#x27; &amp;#x27; -k2 -rn | head -5
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And if the exporter looks wrong, inspect the raw nftables JSON directly:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;sudo nft -j list ruleset | jq .
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;common-failure-modes&quot;&gt;Common failure modes&lt;&#x2F;h2&gt;
&lt;p&gt;There are a few ways to make this look more broken than it is:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Rules without &lt;code&gt;counter&lt;&#x2F;code&gt; will never show up as time series.&lt;&#x2F;li&gt;
&lt;li&gt;Rules without &lt;code&gt;comment&lt;&#x2F;code&gt; will show up, but they will be hard to distinguish in dashboards.&lt;&#x2F;li&gt;
&lt;li&gt;Handle numbers will change after ruleset reloads or rebuilds, which is exactly why the dashboard should key off comments instead.&lt;&#x2F;li&gt;
&lt;li&gt;A very large ruleset means more time series, because every counted rule becomes its own metric series.&lt;&#x2F;li&gt;
&lt;li&gt;The exporter needs enough privilege to run &lt;code&gt;nft list ruleset&lt;&#x2F;code&gt;; the NixOS module handles that, but it is worth remembering when debugging.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The first two are the important ones. If a rule matters, give it a counter and a comment.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-transferable-idea&quot;&gt;The transferable idea&lt;&#x2F;h2&gt;
&lt;p&gt;The exporter is useful, but the real pattern here is bigger than one exporter or one firewall.&lt;&#x2F;p&gt;
&lt;p&gt;When a system already has counters but poor names, add names close to the source and export those names as labels.&lt;&#x2F;p&gt;
&lt;p&gt;That is the whole trick.&lt;&#x2F;p&gt;
&lt;p&gt;In nftables, &lt;code&gt;comment&lt;&#x2F;code&gt; is the naming layer. Once every rule carries intent alongside behavior, Prometheus and Grafana can do the rest. Your firewall stops being a wall of anonymous handles and starts being an observable system you can actually operate.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Hrafnsyn: Unified Air and Sea Tracking with Phoenix LiveView, PostGIS, and Nix</title>
        <published>2026-04-11T15:30:00+00:00</published>
        <updated>2026-04-11T15:30:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/hrafnsyn-unified-air-and-sea-tracking/"/>
        <id>https://perlpimp.net/blog/hrafnsyn-unified-air-and-sea-tracking/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/hrafnsyn-unified-air-and-sea-tracking/">&lt;p&gt;I like projects that are easy to explain in one sentence and then keep getting more interesting the closer you look.&lt;&#x2F;p&gt;
&lt;p&gt;Hrafnsyn starts with a very simple pitch: it puts aircraft and vessels on one living map. ADS-B on the plane side, AIS on the boat side, Phoenix LiveView in front, PostgreSQL&#x2F;PostGIS underneath, and realtime updates the whole way through.&lt;&#x2F;p&gt;
&lt;p&gt;That is already a fun project. But the part that makes it satisfying to me is that it is not just a map widget or a repo full of half-connected experiments. It is a composed operational system. Collectors, a stable ingest boundary, normalized persistence, a useful UI, auth modes, monitoring, and Nix-native deployment all line up in one codebase.&lt;&#x2F;p&gt;
&lt;p&gt;And now it is not just local anymore. Hrafnsyn is deployed from a real NixOS configuration, consumed through my NUR package&#x2F;module path, monitored as a real service, and using a packaged aircraft-enrichment dataset that materially improves the aircraft side. That changes the story from “interesting Phoenix side project” to “this is actually coherent end to end.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;one-map-two-feed-types-one-ingest-boundary&quot;&gt;One map, two feed types, one ingest boundary&lt;&#x2F;h2&gt;
&lt;p&gt;The UI headline is obvious: aircraft and vessels share one dashboard. But the architectural point is the seam behind it.&lt;&#x2F;p&gt;
&lt;p&gt;Hrafnsyn is intentionally split into three layers:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;source collection&lt;&#x2F;li&gt;
&lt;li&gt;normalized ingest and persistence&lt;&#x2F;li&gt;
&lt;li&gt;realtime presentation&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Each upstream feed gets its own long-lived collector process. Right now the contracts are deliberately boring:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;aircraft via &lt;code&gt;GET &#x2F;data&#x2F;aircraft.json&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;vessels via &lt;code&gt;GET &#x2F;api&#x2F;ships_array.json&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is exactly what I want here. One process per source, isolated restart behavior, clear source identity for logs and metrics, and no special cases once the payload has been normalized.&lt;&#x2F;p&gt;
&lt;p&gt;The interesting part is that both collectors converge on a very small shared ingest path:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;elixir&quot; class=&quot;language-elixir &quot;&gt;&lt;code class=&quot;language-elixir&quot; data-lang=&quot;elixir&quot;&gt;def ingest_batch(source, observations) do
  touched_ids =
    observations
    |&amp;gt; Enum.reduce([], fn observation, acc -&amp;gt;
      with {:ok, normalized} &amp;lt;- Observation.new(observation),
           {:ok, track} &amp;lt;- upsert_track(source, normalized),
           {:ok, _point} &amp;lt;- insert_track_point(track, source, normalized) do
        [track.id | acc]
      else
        _ -&amp;gt; acc
      end
    end)
    |&amp;gt; Enum.uniq()

  if touched_ids != [] do
    Phoenix.PubSub.broadcast(Hrafnsyn.PubSub, @topic, {:tracks_updated, touched_ids})
  end

  {:ok, touched_ids}
end
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is the core of the whole app. HTTP collectors use it. Future publishers can use it. The gRPC ingest surface can use it too. Once observations cross that boundary, the rest of the system does not need to care where they came from.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-merge-model-is-what-makes-it-useful&quot;&gt;The merge model is what makes it useful&lt;&#x2F;h2&gt;
&lt;p&gt;The storage layout is where the “unified tracker” idea turns into something more than a single map layer.&lt;&#x2F;p&gt;
&lt;p&gt;Hrafnsyn keeps two related tables:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;tracks&lt;&#x2F;code&gt; for the latest merged state of a vehicle&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;track_points&lt;&#x2F;code&gt; for the append-only history&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The schema makes the merge rules explicit:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;elixir&quot; class=&quot;language-elixir &quot;&gt;&lt;code class=&quot;language-elixir&quot; data-lang=&quot;elixir&quot;&gt;create unique_index(:tracks, [:vehicle_type, :identity])
create unique_index(:track_points, [:track_id, :source_id, :observed_at])
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That means a plane is uniquely identified by its hex code and a vessel by its MMSI, scoped by vehicle type. If multiple sources report the same vehicle, they merge into one current track. But every source can still leave behind its own historical points.&lt;&#x2F;p&gt;
&lt;p&gt;That combination is the right one for this kind of system:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the dashboard shows a clean current state&lt;&#x2F;li&gt;
&lt;li&gt;search works against a normalized identity surface&lt;&#x2F;li&gt;
&lt;li&gt;route history stays durable instead of evaporating with the last websocket message&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The LiveView side stays pleasantly direct because persistence and fan-out are already solved. When tracks update, Phoenix PubSub pushes the change and the dashboard refreshes the map and side panels without inventing a custom frontend state machine for everything.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-aircraft-side-got-meaningfully-better&quot;&gt;The aircraft side got meaningfully better&lt;&#x2F;h2&gt;
&lt;p&gt;One of the newer pieces, and one of the most worthwhile ones, is the packaged static aircraft database.&lt;&#x2F;p&gt;
&lt;p&gt;Live ADS-B feeds are useful, but they are often incomplete. Registration might be missing. Aircraft type may be absent or inconsistent. Some of the details you actually want for a good operational dashboard are available, just not reliably from the live stream itself.&lt;&#x2F;p&gt;
&lt;p&gt;Hrafnsyn now supports a dump1090-derived NDJSON metadata artifact keyed by ICAO hex. It adds:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;registration overrides&lt;&#x2F;li&gt;
&lt;li&gt;aircraft type&lt;&#x2F;li&gt;
&lt;li&gt;type description&lt;&#x2F;li&gt;
&lt;li&gt;wake turbulence category&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The loader is intentionally simple:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;elixir&quot; class=&quot;language-elixir &quot;&gt;&lt;code class=&quot;language-elixir&quot; data-lang=&quot;elixir&quot;&gt;@default_record %{
  registration: nil,
  aircraft_type: nil,
  type_description: nil,
  wake_turbulence_category: nil
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And the enrichment precedence is exactly what you would hope:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;elixir&quot; class=&quot;language-elixir &quot;&gt;&lt;code class=&quot;language-elixir&quot; data-lang=&quot;elixir&quot;&gt;%__MODULE__{
  observation
  | registration: observation.registration || static.registration || derived.registration,
    aircraft_type: observation.aircraft_type || static.aircraft_type,
    type_description:
      observation.type_description || normalize_type_description(static.type_description),
    wake_turbulence_category:
      observation.wake_turbulence_category || static.wake_turbulence_category,
    country: observation.country || derived.country
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;In other words:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;live feed values win first&lt;&#x2F;li&gt;
&lt;li&gt;packaged static metadata fills the gaps second&lt;&#x2F;li&gt;
&lt;li&gt;ICAO-derived fallbacks come last&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is a strong model because it improves the plane side without pretending the static dataset is the source of truth for everything.&lt;&#x2F;p&gt;
&lt;p&gt;More importantly, this database is not a random side file I copy into place by hand. It is a first-class flake output:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;aircraftDb = pkgs.runCommand &amp;quot;hrafnsyn-aircraft-db&amp;quot; {
  nativeBuildInputs = [ pkgs.python3 ];
} &amp;#x27;&amp;#x27;
  mkdir -p &amp;quot;$out&amp;#x2F;share&amp;#x2F;hrafnsyn&amp;quot;

  python ${.&amp;#x2F;scripts&amp;#x2F;build_aircraft_db.py} \
    ${dump1090Src} \
    &amp;quot;$out&amp;#x2F;share&amp;#x2F;hrafnsyn&amp;#x2F;aircraft-db.ndjson&amp;quot; \
    --metadata-output &amp;quot;$out&amp;#x2F;share&amp;#x2F;hrafnsyn&amp;#x2F;aircraft-db-metadata.json&amp;quot; \
    --source-revision ${dump1090Rev}
&amp;#x27;&amp;#x27;;

packages.&amp;quot;aircraft-db&amp;quot; = aircraftDb;
packages.aircraftDb = aircraftDb;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That makes a real difference. The app package and the enrichment package can move through the same Nix-native pipeline instead of one being declarative and the other being “well, remember to put this file on disk somewhere.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-local-developer-loop-is-unusually-friendly&quot;&gt;The local developer loop is unusually friendly&lt;&#x2F;h2&gt;
&lt;p&gt;I care a lot about whether a project feels easy to pick back up after not touching it for a week. Hrafnsyn does.&lt;&#x2F;p&gt;
&lt;p&gt;The shortest local path is:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;sh&quot; class=&quot;language-sh &quot;&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;nix develop
app
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And &lt;code&gt;app&lt;&#x2F;code&gt; is not magic. It is just a small helper that does the right boring things in the right order:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;app = pkgs.writeShellScriptBin &amp;quot;app&amp;quot; &amp;#x27;&amp;#x27;
  set -euo pipefail
  pg-reset
  pg-start
  eval &amp;quot;$(pg-env)&amp;quot;
  mix ecto.setup
  exec mix phx.server
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;There is still a manual path if I want it:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;sh&quot; class=&quot;language-sh &quot;&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;nix develop
pg-start
eval &amp;quot;$(pg-env)&amp;quot;
mix setup
mix phx.server
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;But the helper matters. The flake does not just build the release; it makes the local development loop pleasant too.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;two-clean-consumption-paths&quot;&gt;Two clean consumption paths&lt;&#x2F;h2&gt;
&lt;p&gt;This is one of the reasons I think Hrafnsyn is more interesting than a normal “here is my Phoenix app” project. It is packaged to be consumed the same way I want to consume other software myself.&lt;&#x2F;p&gt;
&lt;p&gt;The first route is direct: add the project as a flake input and use its package or NixOS module directly.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs.hrafnsyn.url = &amp;quot;github:ijohanne&amp;#x2F;hrafnsyn&amp;quot;;

  outputs = { nixpkgs, hrafnsyn, ... }: {
    nixosConfigurations.tracker = nixpkgs.lib.nixosSystem {
      system = &amp;quot;x86_64-linux&amp;quot;;
      modules = [
        hrafnsyn.nixosModules.default
        ({ pkgs, ... }: {
          services.hrafnsyn = {
            enable = true;
            package = hrafnsyn.packages.${pkgs.system}.default;
            aircraftDbPackage = hrafnsyn.packages.${pkgs.system}.aircraftDb;
          };
        })
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The second route is through &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;nur-packages&quot;&gt;&lt;code&gt;ijohanne&#x2F;nur-packages&lt;&#x2F;code&gt;&lt;&#x2F;a&gt;, which exposes both the application and the aircraft metadata package:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;legacyPackages = forPackSystems (system:
  (import .&amp;#x2F;default.nix {
    pkgs = import nixpkgs { inherit system; };
    inherit sources;
  }) &amp;#x2F;&amp;#x2F; {
    hrafnsyn = inputs.hrafnsyn.packages.${system}.default;
    hrafnsyn-aircraft-db = inputs.hrafnsyn.packages.${system}.aircraftDb;
  });

nixosModules = (import .&amp;#x2F;modules) &amp;#x2F;&amp;#x2F; {
  hrafnsyn = inputs.hrafnsyn.nixosModules.default;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That gives consumers both paths:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;use &lt;code&gt;inputs.hrafnsyn&lt;&#x2F;code&gt; directly&lt;&#x2F;li&gt;
&lt;li&gt;use the curated NUR path for packages and module imports&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;I like that because it mirrors how I actually reuse personal infrastructure projects. Some are easier to consume directly. Some fit better through a curated package collection. Hrafnsyn does not force the choice.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-nixos-module-grew-into-a-real-deployment-story&quot;&gt;The NixOS module grew into a real deployment story&lt;&#x2F;h2&gt;
&lt;p&gt;This is the biggest reason the project feels complete now.&lt;&#x2F;p&gt;
&lt;p&gt;The module is no longer just enough to say “yes, you could probably run this on NixOS.” It supports the actual pieces that make a deployment feel native:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;structured &lt;code&gt;database&lt;&#x2F;code&gt; settings with &lt;code&gt;host&lt;&#x2F;code&gt;, &lt;code&gt;name&lt;&#x2F;code&gt;, &lt;code&gt;user&lt;&#x2F;code&gt;, and optional &lt;code&gt;passwordFile&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;aircraftDbPackage&lt;&#x2F;code&gt; as the preferred way to inject the packaged metadata artifact&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;aircraftDbPath&lt;&#x2F;code&gt; as a raw-path escape hatch when needed&lt;&#x2F;li&gt;
&lt;li&gt;structured &lt;code&gt;sources&lt;&#x2F;code&gt; rendered into &lt;code&gt;HRAFNSYN_SOURCES_JSON&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;optional nginx helper&lt;&#x2F;li&gt;
&lt;li&gt;optional dedicated metrics port&lt;&#x2F;li&gt;
&lt;li&gt;readonly-public versus authenticated-private mode&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The runtime contract in &lt;code&gt;config&#x2F;runtime.exs&lt;&#x2F;code&gt; matches that shape cleanly. For PostgreSQL, it accepts either a traditional &lt;code&gt;DATABASE_URL&lt;&#x2F;code&gt; or structured &lt;code&gt;DATABASE_HOST&lt;&#x2F;code&gt;, &lt;code&gt;DATABASE_NAME&lt;&#x2F;code&gt;, &lt;code&gt;DATABASE_USER&lt;&#x2F;code&gt;, and &lt;code&gt;DATABASE_PASSWORD&lt;&#x2F;code&gt;. For aircraft enrichment, it consumes &lt;code&gt;HRAFNSYN_AIRCRAFT_DB_PATH&lt;&#x2F;code&gt;. Nothing here feels bolted on.&lt;&#x2F;p&gt;
&lt;p&gt;On NixOS, the structured local PostgreSQL path is especially nice because the module can create the database and user automatically, then enable &lt;code&gt;citext&lt;&#x2F;code&gt;, &lt;code&gt;pg_trgm&lt;&#x2F;code&gt;, and &lt;code&gt;postgis&lt;&#x2F;code&gt; before &lt;code&gt;hrafnsyn.service&lt;&#x2F;code&gt; starts. The nginx helper also does the right thing when the app binds wildcard addresses like &lt;code&gt;0.0.0.0&lt;&#x2F;code&gt;: nginx still proxies back over loopback instead of trying to hairpin through the public interface.&lt;&#x2F;p&gt;
&lt;p&gt;Here is the sort of representative configuration I mean:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  services.postgresql.enable = true;

  services.hrafnsyn = {
    enable = true;
    package = nurPackages.hrafnsyn;
    aircraftDbPackage = nurPackages.hrafnsyn-aircraft-db;

    host = &amp;quot;tracks.example.com&amp;quot;;
    port = 4020;
    metricsPort = 4022;
    listenAddress = &amp;quot;0.0.0.0&amp;quot;;
    autoMigrate = true;
    publicReadonly = false;

    database = {
      host = &amp;quot;&amp;#x2F;run&amp;#x2F;postgresql&amp;quot;;
      name = &amp;quot;hrafnsyn&amp;quot;;
      user = &amp;quot;hrafnsyn&amp;quot;;
    };

    secretKeyBaseFile = &amp;#x2F;run&amp;#x2F;secrets&amp;#x2F;hrafnsyn-secret-key-base;

    sources = [
      {
        id = &amp;quot;planes-main&amp;quot;;
        name = &amp;quot;Airplane SDR&amp;quot;;
        vehicleType = &amp;quot;plane&amp;quot;;
        adapter = &amp;quot;dump1090&amp;quot;;
        baseUrl = &amp;quot;http:&amp;#x2F;&amp;#x2F;collector-a.internal&amp;quot;;
        pollIntervalMs = 1000;
      }
      {
        id = &amp;quot;boats-main&amp;quot;;
        name = &amp;quot;Boat SDR&amp;quot;;
        vehicleType = &amp;quot;vessel&amp;quot;;
        adapter = &amp;quot;ais_catcher&amp;quot;;
        baseUrl = &amp;quot;http:&amp;#x2F;&amp;#x2F;collector-b.internal:8100&amp;quot;;
        pollIntervalMs = 2500;
      }
    ];
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is not hypothetical anymore. I am running a deployment in this style now, via the NUR module path, with managed secrets, local PostgreSQL socket auth, the packaged aircraft DB wired in declaratively, and the dashboard in authenticated&#x2F;private mode rather than public-readonly mode.&lt;&#x2F;p&gt;
&lt;p&gt;I am being intentionally vague about hostnames, addresses, and secret wiring, because there is no value in dumping private infrastructure details into a blog post. But the important part is that the deployment is real enough to prove the package and module interfaces are not decorative.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;public-display-mode-and-private-ops-mode-both-make-sense&quot;&gt;Public display mode and private ops mode both make sense&lt;&#x2F;h2&gt;
&lt;p&gt;I also like the auth split here because it matches the actual use cases.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;publicReadonly = true&lt;&#x2F;code&gt; is good for a publicly visible tracking screen or a household dashboard where anonymous viewers can look but not touch anything.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;publicReadonly = false&lt;&#x2F;code&gt; flips the posture. Anonymous web users are redirected to login, tracking gRPC calls require JWTs, and ingestion can require authenticated admin tokens. That is the mode I am using for the live deployment right now.&lt;&#x2F;p&gt;
&lt;p&gt;This is a nice example of a small configuration flag doing real design work. It is not “auth later.” The public and private operating modes are part of the application model.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;monitoring-is-part-of-the-project-not-an-afterthought&quot;&gt;Monitoring is part of the project, not an afterthought&lt;&#x2F;h2&gt;
&lt;p&gt;Hrafnsyn exposes &lt;code&gt;&#x2F;metrics&lt;&#x2F;code&gt; on the main endpoint and can split metrics onto a dedicated port with &lt;code&gt;metricsPort&lt;&#x2F;code&gt;. The module can also append a Prometheus scrape job and provision the bundled Grafana dashboard.&lt;&#x2F;p&gt;
&lt;p&gt;That matters because it pushes the project over the line from neat UI into real service. I can deploy it, scrape it, and drop the shipped dashboard into Grafana instead of promising myself I will “add observability later.”&lt;&#x2F;p&gt;
&lt;p&gt;Again, that is not theoretical for me anymore. The live deployment is scraped by Prometheus on its dedicated metrics port, and the bundled Hrafnsyn Grafana dashboard is provisioned from the repo.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-i-think-this-project-is-satisfying&quot;&gt;Why I think this project is satisfying&lt;&#x2F;h2&gt;
&lt;p&gt;The fun part of Hrafnsyn is not just that it shows planes and boats together. It is that several small systems decisions reinforce each other cleanly:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;collectors stay simple&lt;&#x2F;li&gt;
&lt;li&gt;ingest stays centralized&lt;&#x2F;li&gt;
&lt;li&gt;storage keeps both current state and history&lt;&#x2F;li&gt;
&lt;li&gt;LiveView gets realtime updates without frontend contortions&lt;&#x2F;li&gt;
&lt;li&gt;packaging covers both the app and the aircraft metadata artifact&lt;&#x2F;li&gt;
&lt;li&gt;deployment works directly or through NUR&lt;&#x2F;li&gt;
&lt;li&gt;auth can be public-readonly or intentionally private&lt;&#x2F;li&gt;
&lt;li&gt;monitoring is already part of the story&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is the kind of software project I enjoy most. Not a giant platform. Not a throwaway demo. Just one coherent codebase where UI, ingest, storage, enrichment, auth, monitoring, and deployment all belong to the same idea.&lt;&#x2F;p&gt;
&lt;p&gt;And now that it is deployed for real, I find it much easier to argue that Hrafnsyn is interesting for the right reasons. It is not simply “look, I put a map on the web.” It is a practical Phoenix and Nix system with a clean boundary design, a nice operational shape, and just enough ambition to stay fun.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Building a Stratum 1 NTP Server with GPS&#x2F;PPS on Raspberry Pi</title>
        <published>2026-04-10T12:00:00+00:00</published>
        <updated>2026-04-10T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/stratum-1-ntp-server-gps-pps-raspberry-pi/"/>
        <id>https://perlpimp.net/blog/stratum-1-ntp-server-gps-pps-raspberry-pi/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/stratum-1-ntp-server-gps-pps-raspberry-pi/">&lt;p&gt;If you already run a homelab, eventually you start wanting your time to come from inside the house instead of “some pool servers on the internet, probably.” Not because pool.ntp.org is bad. It is excellent. But because once you have Prometheus, Grafana, UPS-backed hosts, and a mild infrastructure problem, the next logical step is apparently “build a stratum 1 clock.”&lt;&#x2F;p&gt;
&lt;p&gt;The good news is that this is entirely doable with a Raspberry Pi 4, a decent GNSS receiver, and chrony. The less good news is that the obvious path through &lt;code&gt;ntpd&lt;&#x2F;code&gt; is a swamp of half-working refclock drivers, shared memory weirdness, and one especially annoying circular dependency. I tried the obvious thing so you do not have to.&lt;&#x2F;p&gt;
&lt;p&gt;This setup ended up here:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Raspberry Pi 4 running Arch Linux ARM&lt;&#x2F;li&gt;
&lt;li&gt;u-blox ZED-F9P GNSS module for serial time + PPS&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gpsd&lt;&#x2F;code&gt; for GPS data and coarse time&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;chrony&lt;&#x2F;code&gt; for actual clock discipline&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;chrony_exporter&lt;&#x2F;code&gt;, Prometheus, and Grafana for visibility&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The end result is a LAN-served stratum 1 NTP server with roughly 150-400ns offset from UTC once it has settled. Which is a mildly absurd amount of precision for a thing sitting next to a router, but that is part of the charm.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;hardware&quot;&gt;Hardware&lt;&#x2F;h2&gt;
&lt;p&gt;The hardware is simple:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Raspberry Pi 4 running 64-bit Arch Linux ARM&lt;&#x2F;li&gt;
&lt;li&gt;u-blox ZED-F9P multi-constellation GNSS receiver&lt;&#x2F;li&gt;
&lt;li&gt;USB serial link, with the receiver showing up as &lt;code&gt;&#x2F;dev&#x2F;ttyACM0&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;PPS wiring exposed as &lt;code&gt;&#x2F;dev&#x2F;pps0&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;I symlinked the serial device to &lt;code&gt;&#x2F;dev&#x2F;gps0&lt;&#x2F;code&gt; with udev because device names that change under you are funny exactly once.&lt;&#x2F;p&gt;
&lt;p&gt;The software stack for the working setup:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;gpsd 3.26.1&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;chrony 4.8&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;chrony_exporter 0.11.0&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Prometheus on a separate monitoring host&lt;&#x2F;li&gt;
&lt;li&gt;Grafana on that same monitoring host&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;why-gps-time-is-two-signals-not-one&quot;&gt;Why GPS Time Is Two Signals, Not One&lt;&#x2F;h2&gt;
&lt;p&gt;The important conceptual piece is that GPS timekeeping is really two separate signals:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;NMEA or UBX messages tell you which second it is.&lt;&#x2F;li&gt;
&lt;li&gt;PPS tells you exactly when that second starts.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Those are not the same job.&lt;&#x2F;p&gt;
&lt;p&gt;The serial GPS stream gives you timestamps, but they arrive through USB, the kernel, buffering, and userspace parsing. That makes them accurate in the “yes, it is definitely this second” sense, but jittery by tens of milliseconds.&lt;&#x2F;p&gt;
&lt;p&gt;PPS, short for pulse per second, is a hardware edge emitted exactly on the second boundary. The kernel timestamps that edge with far better precision than userspace can manage. PPS is what gets you from “roughly correct” to “this box is now taking nanoseconds personally.”&lt;&#x2F;p&gt;
&lt;p&gt;The data path looks like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;GNSS receiver --USB serial--&amp;gt; gpsd --SHM NTP0--&amp;gt; chrony
GNSS receiver --PPS GPIO-----&amp;gt; kernel PPS API (&amp;#x2F;dev&amp;#x2F;pps0) --&amp;gt; chrony
chrony --&amp;gt; LAN clients
chrony_exporter --&amp;gt; Prometheus --&amp;gt; Grafana
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;gpsd&lt;&#x2F;code&gt; also writes PPS into SHM, but that turns out to be the wrong place to consume it from. More on that in the &lt;code&gt;ntpd&lt;&#x2F;code&gt; war story section.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;shm-segments-from-gpsd&quot;&gt;SHM segments from gpsd&lt;&#x2F;h3&gt;
&lt;p&gt;If you inspect shared memory, gpsd creates segments for NTP consumers:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Key&lt;&#x2F;th&gt;&lt;th&gt;Segment&lt;&#x2F;th&gt;&lt;th&gt;Permissions&lt;&#x2F;th&gt;&lt;th&gt;Purpose&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;0x4e545030&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;NTP0&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;600&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Coarse GPS time&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;0x4e545031&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;NTP1&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;600&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;PPS for privileged consumers&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;0x4e545032&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;NTP2&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;666&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;PPS for unprivileged consumers&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;That permission split matters. &lt;code&gt;NTP0&lt;&#x2F;code&gt; is root-only. If your time daemon runs as an unprivileged user, life gets interesting in the bad way.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;getting-gpsd-to-behave&quot;&gt;Getting gpsd To Behave&lt;&#x2F;h2&gt;
&lt;p&gt;On a distro that uses &lt;code&gt;&#x2F;etc&#x2F;default&#x2F;gpsd&lt;&#x2F;code&gt;, the config is straightforward:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;START_DAEMON=&amp;quot;true&amp;quot;
GPSD_OPTIONS=&amp;quot;-n&amp;quot;
DEVICES=&amp;quot;&amp;#x2F;dev&amp;#x2F;gps0 &amp;#x2F;dev&amp;#x2F;pps0&amp;quot;
USBAUTO=&amp;quot;true&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Two bits matter most:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;-n&lt;&#x2F;code&gt; makes &lt;code&gt;gpsd&lt;&#x2F;code&gt; start polling immediately instead of waiting for a client.&lt;&#x2F;li&gt;
&lt;li&gt;You must list both the serial device and the PPS device.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Once that is in place, the debugging commands you actually care about are:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Is gpsd alive?
systemctl status gpsd
ps aux | grep gpsd

# Do you have a fix?
cgps -s

# Raw interactive view
gpsmon

# Stream raw JSON
gpspipe -w | head -20

# Specifically look for PPS events
gpspipe -w | grep PPS

# See SHM segments
ipcs -m
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;what-healthy-gpsd-output-looks-like&quot;&gt;What healthy gpsd output looks like&lt;&#x2F;h3&gt;
&lt;p&gt;A normal TPV message looks something like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;json&quot; class=&quot;language-json &quot;&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;{&amp;quot;class&amp;quot;:&amp;quot;TPV&amp;quot;,&amp;quot;device&amp;quot;:&amp;quot;&amp;#x2F;dev&amp;#x2F;gps0&amp;quot;,&amp;quot;status&amp;quot;:2,&amp;quot;mode&amp;quot;:3,
 &amp;quot;time&amp;quot;:&amp;quot;2026-04-02T11:23:02.000Z&amp;quot;,&amp;quot;lat&amp;quot;:36.42399,&amp;quot;lon&amp;quot;:-5.15240,...}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The fields worth caring about:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;mode=3&lt;&#x2F;code&gt; means you have a 3D fix.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;status=2&lt;&#x2F;code&gt; means DGPS-quality fix.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;time&lt;&#x2F;code&gt; is the coarse second count chrony will use as an anchor.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;A PPS message looks like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;json&quot; class=&quot;language-json &quot;&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;{&amp;quot;class&amp;quot;:&amp;quot;PPS&amp;quot;,&amp;quot;device&amp;quot;:&amp;quot;&amp;#x2F;dev&amp;#x2F;pps0&amp;quot;,&amp;quot;real_sec&amp;quot;:1775128983,&amp;quot;real_nsec&amp;quot;:0,
 &amp;quot;clock_sec&amp;quot;:1775128982,&amp;quot;clock_nsec&amp;quot;:999955940,&amp;quot;precision&amp;quot;:-20,&amp;quot;shm&amp;quot;:&amp;quot;NTP2&amp;quot;}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The important part here is the split between the true GPS second and the system clock’s view of when that edge happened. If the machine clock is off, those numbers will disagree. That becomes important later.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;common-gpsd-potholes&quot;&gt;Common gpsd potholes&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;No serial data at all: on Raspberry Pi, make sure the login shell is not squatting on the UART and verify you are reading the right device.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gpsmon&lt;&#x2F;code&gt; looks empty: often a display quirk, not a gpsd failure. If &lt;code&gt;cgps -s&lt;&#x2F;code&gt; works, you are probably fine.&lt;&#x2F;li&gt;
&lt;li&gt;First fix takes forever: a cold receiver with no almanac can take a while. Outdoors with clear sky, the ZED-F9P is usually much faster.&lt;&#x2F;li&gt;
&lt;li&gt;SHM permissions look wrong: they are wrong, just usually by design.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;pps-the-part-that-actually-makes-this-precise&quot;&gt;PPS: The Part That Actually Makes This Precise&lt;&#x2F;h2&gt;
&lt;p&gt;Before touching NTP at all, verify that PPS is really arriving:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;ls -la &amp;#x2F;dev&amp;#x2F;pps0
sudo ppstest &amp;#x2F;dev&amp;#x2F;pps0
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Healthy &lt;code&gt;ppstest&lt;&#x2F;code&gt; output looks like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;source 0 - assert 1775128192.913287329, sequence: 12896874 - clear  0.000000000, sequence: 0
source 0 - assert 1775128193.913286383, sequence: 12896875 - clear  0.000000000, sequence: 0
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;What you want:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;timestamps exactly one second apart&lt;&#x2F;li&gt;
&lt;li&gt;steadily increasing sequence numbers&lt;&#x2F;li&gt;
&lt;li&gt;very little jitter between pulses&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;What you do not want is no output at all, because that means your wiring, GPIO config, or receiver PPS output is lying to you.&lt;&#x2F;p&gt;
&lt;p&gt;If &lt;code&gt;&#x2F;dev&#x2F;pps0&lt;&#x2F;code&gt; does not exist, stop there and fix the kernel&#x2F;device-tree side first. No NTP daemon can save you from a missing pulse input.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-ntpd-dead-end&quot;&gt;The ntpd Dead End&lt;&#x2F;h2&gt;
&lt;p&gt;Naturally, I started with &lt;code&gt;ntpd&lt;&#x2F;code&gt;, because that is what a lot of old GPS&#x2F;NTP docs still point at. This was a mistake.&lt;&#x2F;p&gt;
&lt;p&gt;The first attempt was the classic shared-memory plus PPS refclock setup:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;conf&quot; class=&quot;language-conf &quot;&gt;&lt;code class=&quot;language-conf&quot; data-lang=&quot;conf&quot;&gt;server 127.127.28.0 minpoll 4 maxpoll 4 prefer
fudge 127.127.28.0 refid GPS

server 127.127.22.0 minpoll 4 maxpoll 4
fudge 127.127.22.0 refid PPS
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That lasted right up until the logs said:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;refclock_newpeer: clock type 22 invalid
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;So the ATOM driver was not even compiled into the Arch Linux ARM package. Excellent start. The driver that is supposed to read PPS directly from the kernel was simply unavailable.&lt;&#x2F;p&gt;
&lt;p&gt;I then tried the GPSD JSON driver. Also bad. It connected, then mostly sulked, and eventually produced the usual &lt;code&gt;clk_no_reply&lt;&#x2F;code&gt; style misery.&lt;&#x2F;p&gt;
&lt;p&gt;That left SHM.&lt;&#x2F;p&gt;
&lt;p&gt;SHM kind of works, in the same way a chair with three legs kind of works if you are very still. The problem is this:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;gpsd&lt;&#x2F;code&gt; timestamps PPS events for SHM using the system clock.&lt;&#x2F;li&gt;
&lt;li&gt;Before the time daemon has disciplined the system clock, that clock is wrong.&lt;&#x2F;li&gt;
&lt;li&gt;Therefore PPS-via-SHM is wrong by exactly the amount you need PPS to fix.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;In my case the PPS SHM source sat around 57ms off and would not converge. That is the circular dependency:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;you need accurate PPS to fix the clock&lt;&#x2F;li&gt;
&lt;li&gt;but PPS through SHM depends on the clock already being accurate&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Then there was the permissions issue layered on top:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;NTP0&lt;&#x2F;code&gt; is root-only&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ntpd&lt;&#x2F;code&gt; runs as &lt;code&gt;ntp&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;the daemon can attach to the segment, but that does not mean it can read it reliably enough to build a clean solution&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;At that point &lt;code&gt;ntpd&lt;&#x2F;code&gt; had consumed enough of my evening and earned nothing further.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;chrony-saves-the-day&quot;&gt;chrony Saves The Day&lt;&#x2F;h2&gt;
&lt;p&gt;The reason chrony works is very simple: it does not insist on walking through the swamp.&lt;&#x2F;p&gt;
&lt;p&gt;This line is the whole difference:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;conf&quot; class=&quot;language-conf &quot;&gt;&lt;code class=&quot;language-conf&quot; data-lang=&quot;conf&quot;&gt;refclock PPS &amp;#x2F;dev&amp;#x2F;pps0 refid PPS lock GPS
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;chrony reads PPS directly from the kernel PPS API through &lt;code&gt;&#x2F;dev&#x2F;pps0&lt;&#x2F;code&gt;. No gpsd SHM timestamping, no circular dependency, no drama. The kernel timestamps the interrupt edge. That is the thing you wanted all along.&lt;&#x2F;p&gt;
&lt;p&gt;The full working &lt;code&gt;chrony.conf&lt;&#x2F;code&gt; looked like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;conf&quot; class=&quot;language-conf &quot;&gt;&lt;code class=&quot;language-conf&quot; data-lang=&quot;conf&quot;&gt;refclock SHM 0 refid GPS precision 1e-1 offset 0.0 delay 0.2
refclock PPS &amp;#x2F;dev&amp;#x2F;pps0 refid PPS lock GPS
server 0.arch.pool.ntp.org iburst
server 1.arch.pool.ntp.org iburst
server 2.arch.pool.ntp.org iburst
server 3.arch.pool.ntp.org iburst
driftfile &amp;#x2F;var&amp;#x2F;lib&amp;#x2F;chrony&amp;#x2F;drift
allow 10.255.0.0&amp;#x2F;16
makestep 1 3
rtcsync
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;What each piece is doing:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;refclock SHM 0&lt;&#x2F;code&gt; reads coarse GPS time from gpsd. That is the “which second is this?” source.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;refclock PPS &#x2F;dev&#x2F;pps0 lock GPS&lt;&#x2F;code&gt; reads the precise edge from the kernel and ties it to the GPS second numbering.&lt;&#x2F;li&gt;
&lt;li&gt;pool servers are there as sanity and fallback, not as the primary source.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;allow&lt;&#x2F;code&gt; lets LAN clients use the Pi as their NTP server.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;makestep 1 3&lt;&#x2F;code&gt; lets chrony fix a badly wrong clock quickly at startup instead of spending half a lifetime slewing it back into reality.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;the-two-chronyc-commands-that-matter&quot;&gt;The two chronyc commands that matter&lt;&#x2F;h3&gt;
&lt;p&gt;First:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;chronyc sources -v
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Healthy output:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;MS Name&amp;#x2F;IP address         Stratum Poll Reach LastRx Last sample
===============================================================================
#- GPS                           0   4   377    24    +54ms[  +54ms] +&amp;#x2F;-  200ms
#* PPS                           0   4   377    22   -153ns[-1337ns] +&amp;#x2F;-  334ns
^- tock.espanix.net              1   6    77    26   -554us[ -555us] +&amp;#x2F;- 6323us
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is exactly what you want to see:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;GPS&lt;&#x2F;code&gt; is noisy by tens of milliseconds, because serial time is supposed to be noisy.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;PPS&lt;&#x2F;code&gt; is sitting down in the nanoseconds, because that is the actual precision source.&lt;&#x2F;li&gt;
&lt;li&gt;internet peers are reachable but clearly not what the clock is following.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Then:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;chronyc tracking
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Example output:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;Reference ID    : 50505300 (PPS)
Stratum         : 1
System time     : 0.000000313 seconds slow of NTP time
Last offset     : -0.000000397 seconds
RMS offset      : 0.000004104 seconds
Frequency       : 5.823 ppm fast
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The headline numbers:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Reference ID : PPS&lt;&#x2F;code&gt; means the Pi is locked to the pulse source, not a remote server.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;Stratum : 1&lt;&#x2F;code&gt; means it is directly attached to a reference clock.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;System time&lt;&#x2F;code&gt; in the sub-microsecond range means the whole thing is doing what you built it to do.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The 5-6ppm oscillator drift on a Raspberry Pi is also completely normal. It is not a fancy oven-controlled oscillator. It is a tiny board that mostly wants to run Linux and warm up slightly.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;if-chrony-looks-wrong&quot;&gt;If chrony looks wrong&lt;&#x2F;h3&gt;
&lt;p&gt;The usual failure modes are mercifully sane:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;PPS shows &lt;code&gt;?&lt;&#x2F;code&gt;: chrony has no coarse time source to lock PPS to, so go fix gpsd&#x2F;SHM first.&lt;&#x2F;li&gt;
&lt;li&gt;GPS offset is 50-100ms: that is normal, not a bug.&lt;&#x2F;li&gt;
&lt;li&gt;Large startup step: normal if the box booted with stale RTC state or no RTC at all. That is what &lt;code&gt;makestep&lt;&#x2F;code&gt; is for.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;exporting-chrony-to-prometheus&quot;&gt;Exporting chrony To Prometheus&lt;&#x2F;h2&gt;
&lt;p&gt;Once the clock itself works, you want to stop SSHing in every time you wonder whether it still works.&lt;&#x2F;p&gt;
&lt;p&gt;I used &lt;code&gt;chrony_exporter 0.11.0&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;curl -LO https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;SuperQ&amp;#x2F;chrony_exporter&amp;#x2F;releases&amp;#x2F;download&amp;#x2F;v0.11.0&amp;#x2F;chrony_exporter-0.11.0.linux-arm64.tar.gz
tar xzf chrony_exporter-0.11.0.linux-arm64.tar.gz
sudo cp chrony_exporter-0.11.0.linux-arm64&amp;#x2F;chrony_exporter &amp;#x2F;usr&amp;#x2F;local&amp;#x2F;bin&amp;#x2F;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The systemd unit:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;ini&quot; class=&quot;language-ini &quot;&gt;&lt;code class=&quot;language-ini&quot; data-lang=&quot;ini&quot;&gt;[Unit]
Description=Chrony Exporter
After=chronyd.service

[Service]
User=root
ExecStart=&amp;#x2F;usr&amp;#x2F;local&amp;#x2F;bin&amp;#x2F;chrony_exporter \
  --collector.sources \
  --collector.serverstats \
  --collector.tracking \
  --chrony.address=unix:&amp;#x2F;&amp;#x2F;&amp;#x2F;run&amp;#x2F;chrony&amp;#x2F;chronyd.sock \
  --collector.chmod-socket
Restart=always

[Install]
WantedBy=multi-user.target
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;why-the-unix-socket-matters&quot;&gt;Why the Unix socket matters&lt;&#x2F;h3&gt;
&lt;p&gt;The exporter can talk to chronyd in two ways:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;UDP command port (&lt;code&gt;127.0.0.1:323&lt;&#x2F;code&gt;)&lt;&#x2F;li&gt;
&lt;li&gt;Unix socket (&lt;code&gt;&#x2F;run&#x2F;chrony&#x2F;chronyd.sock&lt;&#x2F;code&gt;)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;UDP sounds simpler. It is not simpler.&lt;&#x2F;p&gt;
&lt;p&gt;With UDP, the tracking collector may work, but &lt;code&gt;serverstats&lt;&#x2F;code&gt; tends to run into authentication and &lt;code&gt;UNAUTH&lt;&#x2F;code&gt; problems. &lt;code&gt;cmdallow&lt;&#x2F;code&gt; is not enough to make the more privileged commands happy. You end up staring at &lt;code&gt;chrony_up 0&lt;&#x2F;code&gt; and wondering why a daemon on localhost is acting like you are an intruder.&lt;&#x2F;p&gt;
&lt;p&gt;The Unix socket is the correct answer:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;no UDP command ACL weirdness&lt;&#x2F;li&gt;
&lt;li&gt;all collectors work&lt;&#x2F;li&gt;
&lt;li&gt;permissions are handled explicitly&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;If the exporter misbehaves, these are the checks worth running:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;systemctl status chrony-exporter
journalctl -u chrony-exporter -n 30 --no-pager
curl -s http:&amp;#x2F;&amp;#x2F;localhost:9123&amp;#x2F;metrics | grep chrony_up
curl -s http:&amp;#x2F;&amp;#x2F;localhost:9123&amp;#x2F;metrics | grep ^chrony_
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If you need more detail, add &lt;code&gt;--log.level=debug&lt;&#x2F;code&gt; and look for &lt;code&gt;UNAUTH&lt;&#x2F;code&gt; or socket-permission complaints.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;prometheus-scrape-config&quot;&gt;Prometheus scrape config&lt;&#x2F;h3&gt;
&lt;p&gt;The scrape side is unremarkable, which is how monitoring should be:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;yaml&quot; class=&quot;language-yaml &quot;&gt;&lt;code class=&quot;language-yaml&quot; data-lang=&quot;yaml&quot;&gt;scrape_configs:
  - job_name: chrony
    static_configs:
      - targets: [&amp;#x27;chronos:9123&amp;#x27;]
        labels:
          instance: chronos
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;metrics-that-actually-matter&quot;&gt;Metrics that actually matter&lt;&#x2F;h3&gt;
&lt;p&gt;Tracking metrics tell you whether the machine itself is healthy:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;&#x2F;th&gt;&lt;th&gt;Meaning&lt;&#x2F;th&gt;&lt;th&gt;Good value&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_up&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Exporter can talk to chronyd&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;1&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_tracking_stratum&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;NTP stratum&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;1&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_tracking_system_time_seconds&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Current system offset&lt;&#x2F;td&gt;&lt;td&gt;under &lt;code&gt;1us&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_tracking_last_offset_seconds&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Latest measurement&lt;&#x2F;td&gt;&lt;td&gt;under &lt;code&gt;1us&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_tracking_rms_offset_seconds&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Long-term average offset&lt;&#x2F;td&gt;&lt;td&gt;under &lt;code&gt;10us&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_tracking_frequency_ppms&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Local oscillator drift&lt;&#x2F;td&gt;&lt;td&gt;stable and boring&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_tracking_skew_ppms&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Error bound on frequency&lt;&#x2F;td&gt;&lt;td&gt;low&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;Source metrics tell you whether individual peers and refclocks still look sane:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;&#x2F;th&gt;&lt;th&gt;Meaning&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_sources_last_sample_offset_seconds&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Offset per source&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_sources_last_sample_error_margin_seconds&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Error margin per source&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_sources_reachability_ratio&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Poll success ratio&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_sources_stratum&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Source stratum&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_sources_state_info&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;sync&lt;&#x2F;code&gt;, &lt;code&gt;candidate&lt;&#x2F;code&gt;, &lt;code&gt;outlier&lt;&#x2F;code&gt;, &lt;code&gt;unreachable&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chrony_sources_polling_interval_seconds&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Poll interval&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;The nice thing about this combination is that it tells you both whether the local clock is good and why it is good.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;grafana-dashboard&quot;&gt;Grafana Dashboard&lt;&#x2F;h2&gt;
&lt;p&gt;The dashboard I ended up with has three sections:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Status row for “is this box healthy right now?”&lt;&#x2F;li&gt;
&lt;li&gt;Tracking row for offset, drift, and dispersion over time&lt;&#x2F;li&gt;
&lt;li&gt;Sources row for per-source behavior, including a table that shows which source is actually syncing&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The useful mental model is:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;GPS&lt;&#x2F;code&gt; should be noisy and coarse&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;PPS&lt;&#x2F;code&gt; should be boring and near zero&lt;&#x2F;li&gt;
&lt;li&gt;remote internet peers should be the backup singers, not the lead vocalist&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Here is the full dashboard JSON. Paste it into Grafana’s import dialog or provision it from a file.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;json&quot; class=&quot;language-json &quot;&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;{
  &amp;quot;annotations&amp;quot;: {
    &amp;quot;list&amp;quot;: []
  },
  &amp;quot;editable&amp;quot;: true,
  &amp;quot;fiscalYearStartMonth&amp;quot;: 0,
  &amp;quot;graphTooltip&amp;quot;: 1,
  &amp;quot;links&amp;quot;: [],
  &amp;quot;panels&amp;quot;: [
    {
      &amp;quot;collapsed&amp;quot;: false,
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 1, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 0 },
      &amp;quot;id&amp;quot;: 1,
      &amp;quot;title&amp;quot;: &amp;quot;Status&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;row&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;mappings&amp;quot;: [
            { &amp;quot;options&amp;quot;: { &amp;quot;0&amp;quot;: { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;text&amp;quot;: &amp;quot;DOWN&amp;quot; }, &amp;quot;1&amp;quot;: { &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;text&amp;quot;: &amp;quot;UP&amp;quot; } }, &amp;quot;type&amp;quot;: &amp;quot;value&amp;quot; }
          ],
          &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: null }, { &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: 1 }] }
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 3, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 2,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;background&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;, &amp;quot;justifyMode&amp;quot;: &amp;quot;center&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;Chrony Status&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;chrony_up{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }],
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: null }, { &amp;quot;color&amp;quot;: &amp;quot;yellow&amp;quot;, &amp;quot;value&amp;quot;: 2 }, { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: 3 }] }
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 3, &amp;quot;x&amp;quot;: 3, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 3,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;value&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;, &amp;quot;justifyMode&amp;quot;: &amp;quot;center&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;Stratum&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_stratum{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }],
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;unit&amp;quot;: &amp;quot;s&amp;quot;,
          &amp;quot;decimals&amp;quot;: 3,
          &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: null }, { &amp;quot;color&amp;quot;: &amp;quot;yellow&amp;quot;, &amp;quot;value&amp;quot;: 0.001 }, { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: 0.01 }] }
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 4, &amp;quot;x&amp;quot;: 6, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 4,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;value&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;area&amp;quot;, &amp;quot;justifyMode&amp;quot;: &amp;quot;center&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;System Time Offset&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_system_time_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }],
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;unit&amp;quot;: &amp;quot;s&amp;quot;,
          &amp;quot;decimals&amp;quot;: 6,
          &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: null }, { &amp;quot;color&amp;quot;: &amp;quot;yellow&amp;quot;, &amp;quot;value&amp;quot;: 0.001 }, { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: 0.01 }] }
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 4, &amp;quot;x&amp;quot;: 10, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 5,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;value&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;area&amp;quot;, &amp;quot;justifyMode&amp;quot;: &amp;quot;center&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;Last Offset&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_last_offset_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }],
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;unit&amp;quot;: &amp;quot;s&amp;quot;,
          &amp;quot;decimals&amp;quot;: 6,
          &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: null }, { &amp;quot;color&amp;quot;: &amp;quot;yellow&amp;quot;, &amp;quot;value&amp;quot;: 0.001 }, { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: 0.01 }] }
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 4, &amp;quot;x&amp;quot;: 14, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 6,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;value&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;area&amp;quot;, &amp;quot;justifyMode&amp;quot;: &amp;quot;center&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;RMS Offset&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_rms_offset_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }],
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: null }] }
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 3, &amp;quot;x&amp;quot;: 18, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 7,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;value&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;, &amp;quot;justifyMode&amp;quot;: &amp;quot;center&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;Reference&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_info{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{tracking_name}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }],
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;unit&amp;quot;: &amp;quot;ppm&amp;quot;,
          &amp;quot;decimals&amp;quot;: 3,
          &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: null }, { &amp;quot;color&amp;quot;: &amp;quot;yellow&amp;quot;, &amp;quot;value&amp;quot;: 10 }, { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: 50 }] }
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 3, &amp;quot;x&amp;quot;: 21, &amp;quot;y&amp;quot;: 1 },
      &amp;quot;id&amp;quot;: 8,
      &amp;quot;options&amp;quot;: { &amp;quot;colorMode&amp;quot;: &amp;quot;value&amp;quot;, &amp;quot;graphMode&amp;quot;: &amp;quot;area&amp;quot;, &amp;quot;justifyMode&amp;quot;: &amp;quot;center&amp;quot;, &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false }, &amp;quot;textMode&amp;quot;: &amp;quot;value&amp;quot; },
      &amp;quot;title&amp;quot;: &amp;quot;Frequency Error&amp;quot;,
      &amp;quot;targets&amp;quot;: [{ &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_frequency_ppms{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }],
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;collapsed&amp;quot;: false,
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 1, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 5 },
      &amp;quot;id&amp;quot;: 10,
      &amp;quot;title&amp;quot;: &amp;quot;Tracking&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;row&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: { &amp;quot;axisBorderShow&amp;quot;: false, &amp;quot;axisCenteredZero&amp;quot;: true, &amp;quot;axisLabel&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;pointSize&amp;quot;: 5, &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;, &amp;quot;spanNulls&amp;quot;: false },
          &amp;quot;unit&amp;quot;: &amp;quot;s&amp;quot;,
          &amp;quot;decimals&amp;quot;: 6
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 6 },
      &amp;quot;id&amp;quot;: 11,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;, &amp;quot;min&amp;quot;, &amp;quot;max&amp;quot;], &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot; }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;System Clock Offset&amp;quot;,
      &amp;quot;targets&amp;quot;: [
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_system_time_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;system time&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; },
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_last_offset_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;last offset&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;B&amp;quot; },
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_rms_offset_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;RMS offset&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;C&amp;quot; }
      ],
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: { &amp;quot;axisBorderShow&amp;quot;: false, &amp;quot;axisCenteredZero&amp;quot;: false, &amp;quot;axisLabel&amp;quot;: &amp;quot;ppm&amp;quot;, &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;pointSize&amp;quot;: 5, &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;, &amp;quot;spanNulls&amp;quot;: false },
          &amp;quot;unit&amp;quot;: &amp;quot;ppm&amp;quot;,
          &amp;quot;decimals&amp;quot;: 3
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 6 },
      &amp;quot;id&amp;quot;: 12,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;, &amp;quot;min&amp;quot;, &amp;quot;max&amp;quot;], &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot; }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Frequency&amp;quot;,
      &amp;quot;targets&amp;quot;: [
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_frequency_ppms{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;frequency error&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; },
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_residual_frequency_ppms{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;residual frequency&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;B&amp;quot; },
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_skew_ppms{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;skew (error bound)&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;C&amp;quot; }
      ],
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: { &amp;quot;axisBorderShow&amp;quot;: false, &amp;quot;axisCenteredZero&amp;quot;: false, &amp;quot;axisLabel&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;pointSize&amp;quot;: 5, &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;, &amp;quot;spanNulls&amp;quot;: false },
          &amp;quot;unit&amp;quot;: &amp;quot;s&amp;quot;,
          &amp;quot;decimals&amp;quot;: 6
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 14 },
      &amp;quot;id&amp;quot;: 13,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;, &amp;quot;min&amp;quot;, &amp;quot;max&amp;quot;], &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot; }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Root Delay &amp;amp; Dispersion&amp;quot;,
      &amp;quot;targets&amp;quot;: [
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_root_delay_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;root delay&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; },
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_root_dispersion_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;root dispersion&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;B&amp;quot; }
      ],
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: { &amp;quot;axisBorderShow&amp;quot;: false, &amp;quot;axisCenteredZero&amp;quot;: false, &amp;quot;axisLabel&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;, &amp;quot;fillOpacity&amp;quot;: 0, &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;pointSize&amp;quot;: 5, &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;, &amp;quot;spanNulls&amp;quot;: false },
          &amp;quot;unit&amp;quot;: &amp;quot;s&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 14 },
      &amp;quot;id&amp;quot;: 14,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot; }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Update Interval&amp;quot;,
      &amp;quot;targets&amp;quot;: [
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_tracking_update_interval_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;update interval&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }
      ],
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;collapsed&amp;quot;: false,
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 1, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 22 },
      &amp;quot;id&amp;quot;: 20,
      &amp;quot;title&amp;quot;: &amp;quot;Sources&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;row&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: { &amp;quot;axisBorderShow&amp;quot;: false, &amp;quot;axisCenteredZero&amp;quot;: true, &amp;quot;axisLabel&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;pointSize&amp;quot;: 5, &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;, &amp;quot;spanNulls&amp;quot;: false },
          &amp;quot;unit&amp;quot;: &amp;quot;s&amp;quot;,
          &amp;quot;decimals&amp;quot;: 6
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 9, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 23 },
      &amp;quot;id&amp;quot;: 21,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;], &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot;, &amp;quot;sortBy&amp;quot;: &amp;quot;Last *&amp;quot;, &amp;quot;sortDesc&amp;quot;: false }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Source Offsets&amp;quot;,
      &amp;quot;targets&amp;quot;: [
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_sources_last_sample_offset_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{source_name}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }
      ],
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: { &amp;quot;axisBorderShow&amp;quot;: false, &amp;quot;axisCenteredZero&amp;quot;: false, &amp;quot;axisLabel&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;, &amp;quot;fillOpacity&amp;quot;: 10, &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;pointSize&amp;quot;: 5, &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;, &amp;quot;spanNulls&amp;quot;: false },
          &amp;quot;unit&amp;quot;: &amp;quot;s&amp;quot;,
          &amp;quot;decimals&amp;quot;: 6
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 9, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 23 },
      &amp;quot;id&amp;quot;: 22,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;mean&amp;quot;, &amp;quot;lastNotNull&amp;quot;], &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot;, &amp;quot;sortBy&amp;quot;: &amp;quot;Last *&amp;quot;, &amp;quot;sortDesc&amp;quot;: false }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Source Error Margin&amp;quot;,
      &amp;quot;targets&amp;quot;: [
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_sources_last_sample_error_margin_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{source_name}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }
      ],
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: { &amp;quot;axisBorderShow&amp;quot;: false, &amp;quot;axisCenteredZero&amp;quot;: false, &amp;quot;axisLabel&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;, &amp;quot;fillOpacity&amp;quot;: 30, &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;pointSize&amp;quot;: 5, &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;, &amp;quot;spanNulls&amp;quot;: false },
          &amp;quot;min&amp;quot;: 0,
          &amp;quot;max&amp;quot;: 1,
          &amp;quot;unit&amp;quot;: &amp;quot;percentunit&amp;quot;,
          &amp;quot;decimals&amp;quot;: 0
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 32 },
      &amp;quot;id&amp;quot;: 23,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot; }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Source Reachability&amp;quot;,
      &amp;quot;targets&amp;quot;: [
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_sources_reachability_ratio{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{source_name}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }
      ],
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: { &amp;quot;axisBorderShow&amp;quot;: false, &amp;quot;axisCenteredZero&amp;quot;: false, &amp;quot;axisLabel&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;, &amp;quot;fillOpacity&amp;quot;: 0, &amp;quot;lineWidth&amp;quot;: 1, &amp;quot;pointSize&amp;quot;: 5, &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;, &amp;quot;spanNulls&amp;quot;: false },
          &amp;quot;unit&amp;quot;: &amp;quot;s&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 32 },
      &amp;quot;id&amp;quot;: 24,
      &amp;quot;options&amp;quot;: { &amp;quot;legend&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot; }, &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; } },
      &amp;quot;title&amp;quot;: &amp;quot;Source Polling Interval&amp;quot;,
      &amp;quot;targets&amp;quot;: [
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_sources_polling_interval_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;legendFormat&amp;quot;: &amp;quot;{{source_name}}&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; }
      ],
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;custom&amp;quot;: {
            &amp;quot;align&amp;quot;: &amp;quot;auto&amp;quot;,
            &amp;quot;cellOptions&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;auto&amp;quot; },
            &amp;quot;inspect&amp;quot;: false
          },
          &amp;quot;mappings&amp;quot;: [],
          &amp;quot;thresholds&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: null }] }
        },
        &amp;quot;overrides&amp;quot;: [
          {
            &amp;quot;matcher&amp;quot;: { &amp;quot;id&amp;quot;: &amp;quot;byName&amp;quot;, &amp;quot;options&amp;quot;: &amp;quot;State&amp;quot; },
            &amp;quot;properties&amp;quot;: [
              {
                &amp;quot;id&amp;quot;: &amp;quot;mappings&amp;quot;,
                &amp;quot;value&amp;quot;: [
                  { &amp;quot;options&amp;quot;: { &amp;quot;outlier&amp;quot;: { &amp;quot;color&amp;quot;: &amp;quot;orange&amp;quot;, &amp;quot;text&amp;quot;: &amp;quot;outlier&amp;quot; }, &amp;quot;sync&amp;quot;: { &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;text&amp;quot;: &amp;quot;sync&amp;quot; }, &amp;quot;candidate&amp;quot;: { &amp;quot;color&amp;quot;: &amp;quot;blue&amp;quot;, &amp;quot;text&amp;quot;: &amp;quot;candidate&amp;quot; }, &amp;quot;unreachable&amp;quot;: { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;text&amp;quot;: &amp;quot;unreachable&amp;quot; } }, &amp;quot;type&amp;quot;: &amp;quot;value&amp;quot; }
                ]
              },
              { &amp;quot;id&amp;quot;: &amp;quot;custom.cellOptions&amp;quot;, &amp;quot;value&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;basic&amp;quot;, &amp;quot;type&amp;quot;: &amp;quot;color-background&amp;quot; } }
            ]
          },
          {
            &amp;quot;matcher&amp;quot;: { &amp;quot;id&amp;quot;: &amp;quot;byName&amp;quot;, &amp;quot;options&amp;quot;: &amp;quot;Stratum&amp;quot; },
            &amp;quot;properties&amp;quot;: [{ &amp;quot;id&amp;quot;: &amp;quot;custom.width&amp;quot;, &amp;quot;value&amp;quot;: 80 }]
          },
          {
            &amp;quot;matcher&amp;quot;: { &amp;quot;id&amp;quot;: &amp;quot;byName&amp;quot;, &amp;quot;options&amp;quot;: &amp;quot;Mode&amp;quot; },
            &amp;quot;properties&amp;quot;: [{ &amp;quot;id&amp;quot;: &amp;quot;custom.width&amp;quot;, &amp;quot;value&amp;quot;: 130 }]
          },
          {
            &amp;quot;matcher&amp;quot;: { &amp;quot;id&amp;quot;: &amp;quot;byName&amp;quot;, &amp;quot;options&amp;quot;: &amp;quot;Reachability&amp;quot; },
            &amp;quot;properties&amp;quot;: [
              { &amp;quot;id&amp;quot;: &amp;quot;unit&amp;quot;, &amp;quot;value&amp;quot;: &amp;quot;percentunit&amp;quot; },
              { &amp;quot;id&amp;quot;: &amp;quot;decimals&amp;quot;, &amp;quot;value&amp;quot;: 0 },
              { &amp;quot;id&amp;quot;: &amp;quot;custom.cellOptions&amp;quot;, &amp;quot;value&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;basic&amp;quot;, &amp;quot;type&amp;quot;: &amp;quot;color-background&amp;quot; } },
              { &amp;quot;id&amp;quot;: &amp;quot;thresholds&amp;quot;, &amp;quot;value&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;, &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: null }, { &amp;quot;color&amp;quot;: &amp;quot;yellow&amp;quot;, &amp;quot;value&amp;quot;: 0.5 }, { &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: 0.875 }] } }
            ]
          },
          {
            &amp;quot;matcher&amp;quot;: { &amp;quot;id&amp;quot;: &amp;quot;byName&amp;quot;, &amp;quot;options&amp;quot;: &amp;quot;Offset&amp;quot; },
            &amp;quot;properties&amp;quot;: [{ &amp;quot;id&amp;quot;: &amp;quot;unit&amp;quot;, &amp;quot;value&amp;quot;: &amp;quot;s&amp;quot; }, { &amp;quot;id&amp;quot;: &amp;quot;decimals&amp;quot;, &amp;quot;value&amp;quot;: 6 }]
          },
          {
            &amp;quot;matcher&amp;quot;: { &amp;quot;id&amp;quot;: &amp;quot;byName&amp;quot;, &amp;quot;options&amp;quot;: &amp;quot;Error Margin&amp;quot; },
            &amp;quot;properties&amp;quot;: [{ &amp;quot;id&amp;quot;: &amp;quot;unit&amp;quot;, &amp;quot;value&amp;quot;: &amp;quot;s&amp;quot; }, { &amp;quot;id&amp;quot;: &amp;quot;decimals&amp;quot;, &amp;quot;value&amp;quot;: 6 }]
          }
        ]
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 8, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 40 },
      &amp;quot;id&amp;quot;: 25,
      &amp;quot;options&amp;quot;: { &amp;quot;showHeader&amp;quot;: true, &amp;quot;footer&amp;quot;: { &amp;quot;show&amp;quot;: false } },
      &amp;quot;title&amp;quot;: &amp;quot;Source Details&amp;quot;,
      &amp;quot;targets&amp;quot;: [
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_sources_state_info{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;format&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;instant&amp;quot;: true, &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot; },
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_sources_stratum{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;format&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;instant&amp;quot;: true, &amp;quot;refId&amp;quot;: &amp;quot;B&amp;quot; },
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_sources_reachability_ratio{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;format&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;instant&amp;quot;: true, &amp;quot;refId&amp;quot;: &amp;quot;C&amp;quot; },
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_sources_last_sample_offset_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;format&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;instant&amp;quot;: true, &amp;quot;refId&amp;quot;: &amp;quot;D&amp;quot; },
        { &amp;quot;expr&amp;quot;: &amp;quot;chrony_sources_last_sample_error_margin_seconds{instance=\&amp;quot;chronos\&amp;quot;}&amp;quot;, &amp;quot;format&amp;quot;: &amp;quot;table&amp;quot;, &amp;quot;instant&amp;quot;: true, &amp;quot;refId&amp;quot;: &amp;quot;E&amp;quot; }
      ],
      &amp;quot;transformations&amp;quot;: [
        {
          &amp;quot;id&amp;quot;: &amp;quot;joinByField&amp;quot;,
          &amp;quot;options&amp;quot;: { &amp;quot;byField&amp;quot;: &amp;quot;source_name&amp;quot;, &amp;quot;mode&amp;quot;: &amp;quot;outer&amp;quot; }
        },
        {
          &amp;quot;id&amp;quot;: &amp;quot;organize&amp;quot;,
          &amp;quot;options&amp;quot;: {
            &amp;quot;excludeByName&amp;quot;: {
              &amp;quot;Time&amp;quot;: true, &amp;quot;Time 1&amp;quot;: true, &amp;quot;Time 2&amp;quot;: true, &amp;quot;Time 3&amp;quot;: true, &amp;quot;Time 4&amp;quot;: true, &amp;quot;Time 5&amp;quot;: true,
              &amp;quot;__name__&amp;quot;: true, &amp;quot;__name__ 1&amp;quot;: true, &amp;quot;__name__ 2&amp;quot;: true, &amp;quot;__name__ 3&amp;quot;: true, &amp;quot;__name__ 4&amp;quot;: true,
              &amp;quot;instance&amp;quot;: true, &amp;quot;instance 1&amp;quot;: true, &amp;quot;instance 2&amp;quot;: true, &amp;quot;instance 3&amp;quot;: true, &amp;quot;instance 4&amp;quot;: true,
              &amp;quot;job&amp;quot;: true, &amp;quot;job 1&amp;quot;: true, &amp;quot;job 2&amp;quot;: true, &amp;quot;job 3&amp;quot;: true, &amp;quot;job 4&amp;quot;: true,
              &amp;quot;source_address&amp;quot;: true, &amp;quot;source_address 1&amp;quot;: true, &amp;quot;source_address 2&amp;quot;: true, &amp;quot;source_address 3&amp;quot;: true, &amp;quot;source_address 4&amp;quot;: true,
              &amp;quot;source_name 1&amp;quot;: true, &amp;quot;source_name 2&amp;quot;: true, &amp;quot;source_name 3&amp;quot;: true, &amp;quot;source_name 4&amp;quot;: true,
              &amp;quot;tracking_address&amp;quot;: true, &amp;quot;tracking_name&amp;quot;: true, &amp;quot;tracking_refid&amp;quot;: true
            },
            &amp;quot;renameByName&amp;quot;: {
              &amp;quot;source_name&amp;quot;: &amp;quot;Source&amp;quot;,
              &amp;quot;source_mode&amp;quot;: &amp;quot;Mode&amp;quot;,
              &amp;quot;source_state&amp;quot;: &amp;quot;State&amp;quot;,
              &amp;quot;Value #A&amp;quot;: &amp;quot;&amp;quot;,
              &amp;quot;Value #B&amp;quot;: &amp;quot;Stratum&amp;quot;,
              &amp;quot;Value #C&amp;quot;: &amp;quot;Reachability&amp;quot;,
              &amp;quot;Value #D&amp;quot;: &amp;quot;Offset&amp;quot;,
              &amp;quot;Value #E&amp;quot;: &amp;quot;Error Margin&amp;quot;
            }
          }
        },
        {
          &amp;quot;id&amp;quot;: &amp;quot;filterByValue&amp;quot;,
          &amp;quot;options&amp;quot;: {
            &amp;quot;filters&amp;quot;: [{ &amp;quot;fieldName&amp;quot;: &amp;quot;Source&amp;quot;, &amp;quot;config&amp;quot;: { &amp;quot;id&amp;quot;: &amp;quot;isNotNull&amp;quot; } }],
            &amp;quot;match&amp;quot;: &amp;quot;all&amp;quot;,
            &amp;quot;type&amp;quot;: &amp;quot;include&amp;quot;
          }
        }
      ],
      &amp;quot;type&amp;quot;: &amp;quot;table&amp;quot;
    }
  ],
  &amp;quot;schemaVersion&amp;quot;: 39,
  &amp;quot;templating&amp;quot;: {
    &amp;quot;list&amp;quot;: [
      {
        &amp;quot;current&amp;quot;: { &amp;quot;selected&amp;quot;: false, &amp;quot;text&amp;quot;: &amp;quot;Prometheus&amp;quot;, &amp;quot;value&amp;quot;: &amp;quot;PBFA97CFB590B2093&amp;quot; },
        &amp;quot;hide&amp;quot;: 0,
        &amp;quot;includeAll&amp;quot;: false,
        &amp;quot;name&amp;quot;: &amp;quot;datasource&amp;quot;,
        &amp;quot;options&amp;quot;: [],
        &amp;quot;query&amp;quot;: &amp;quot;prometheus&amp;quot;,
        &amp;quot;refresh&amp;quot;: 1,
        &amp;quot;type&amp;quot;: &amp;quot;datasource&amp;quot;
      }
    ]
  },
  &amp;quot;time&amp;quot;: { &amp;quot;from&amp;quot;: &amp;quot;now-6h&amp;quot;, &amp;quot;to&amp;quot;: &amp;quot;now&amp;quot; },
  &amp;quot;timepicker&amp;quot;: {},
  &amp;quot;timezone&amp;quot;: &amp;quot;&amp;quot;,
  &amp;quot;title&amp;quot;: &amp;quot;Chrony &amp;#x2F; GPS Time Server&amp;quot;,
  &amp;quot;uid&amp;quot;: &amp;quot;chrony-gps&amp;quot;,
  &amp;quot;version&amp;quot;: 1
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;quick-health-check&quot;&gt;Quick Health Check&lt;&#x2F;h2&gt;
&lt;p&gt;When you want to verify the full stack without thinking too hard, these are enough:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# GPS serial data arriving?
gpspipe -w -n 3 | grep TPV

# PPS pulses arriving?
sudo ppstest &amp;#x2F;dev&amp;#x2F;pps0

# gpsd seeing PPS too?
gpspipe -w -n 3 | grep PPS

# chrony locked to PPS with tiny offset?
chronyc tracking

# all time sources sane?
chronyc sources -v

# exporter up?
curl -s http:&amp;#x2F;&amp;#x2F;localhost:9123&amp;#x2F;metrics | grep chrony_up

# scrape works from the monitoring host?
curl -s http:&amp;#x2F;&amp;#x2F;chronos:9123&amp;#x2F;metrics | grep chrony_tracking_stratum
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If those checks pass, the machine is doing its job.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping Up&lt;&#x2F;h2&gt;
&lt;p&gt;The working recipe is surprisingly clean once you stop trying to force &lt;code&gt;ntpd&lt;&#x2F;code&gt; through it:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;use &lt;code&gt;gpsd&lt;&#x2F;code&gt; for coarse time and visibility&lt;&#x2F;li&gt;
&lt;li&gt;read PPS directly from &lt;code&gt;&#x2F;dev&#x2F;pps0&lt;&#x2F;code&gt; with chrony&lt;&#x2F;li&gt;
&lt;li&gt;let chrony serve the LAN&lt;&#x2F;li&gt;
&lt;li&gt;export the state so you can see when something drifts, drops, or silently stops being clever&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The core lesson is that GPS plus PPS is a two-signal system, and your time daemon needs to respect that split. Let GPS tell you which second it is. Let PPS tell you exactly when it started. Let chrony do the fusion. And let &lt;code&gt;ntpd&lt;&#x2F;code&gt; enjoy retirement.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Reliable Headless Raspberry Pi Provisioning on NixOS</title>
        <published>2026-04-09T02:40:00+00:00</published>
        <updated>2026-04-09T02:40:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/reliable-headless-raspberry-pi-provisioning-nixos/"/>
        <id>https://perlpimp.net/blog/reliable-headless-raspberry-pi-provisioning-nixos/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/reliable-headless-raspberry-pi-provisioning-nixos/">&lt;p&gt;Headless Raspberry Pi provisioning sounds like it should be the easy case. Build an image, flash it, power it on, and SSH in. In practice, the first boot is where all the brittleness hides. If Wi-Fi is not up, if the host key is not the one you expected, or if secret decryption depends on services that themselves depend on decrypted secrets, you do not get a graceful fallback. You get a Pi that never appears on the network.&lt;&#x2F;p&gt;
&lt;p&gt;In this repo, the workflow we ended up trusting is:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run .#raspberry-pi-provision-image -- &amp;lt;host&amp;gt; &amp;lt;output-path-or-directory&amp;gt;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That wrapper builds the host’s NixOS &lt;code&gt;sdImage&lt;&#x2F;code&gt;, copies it to the destination you asked for, and injects the bootstrap files the machine needs before its first boot. For &lt;code&gt;bastet&lt;&#x2F;code&gt;, those files are:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;&#x2F;var&#x2F;lib&#x2F;bootstrap&#x2F;ssh_host_ed25519_key&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;&#x2F;var&#x2F;lib&#x2F;bootstrap&#x2F;ssh_host_ed25519_key.pub&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;&#x2F;var&#x2F;lib&#x2F;networkmanager&#x2F;system-connections.env&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Those three files solve the hard part: stable SSH identity and working Wi-Fi from the first power-on, before the device is reachable for any remote deployment step.&lt;&#x2F;p&gt;
&lt;p&gt;The important lesson was not “use this exact script.” It was more general: for headless bootstrap on NixOS, image-time secret injection is often more reliable than trying to decrypt first-boot secrets on the device itself. And if you do inject mutable bootstrap state, it belongs under &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;...&lt;&#x2F;code&gt;, not &lt;code&gt;&#x2F;etc&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-provisioning-goal&quot;&gt;The provisioning goal&lt;&#x2F;h2&gt;
&lt;p&gt;The target was simple enough:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Build a NixOS image for a Pi host.&lt;&#x2F;li&gt;
&lt;li&gt;Put that image on removable media.&lt;&#x2F;li&gt;
&lt;li&gt;Power on the Pi somewhere inconvenient.&lt;&#x2F;li&gt;
&lt;li&gt;Have it come up on Wi-Fi immediately with a predictable SSH host key.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That last point matters more than it sounds. If the host key is precomputed, you can pin trust ahead of time instead of accepting a surprise key on first contact. If Wi-Fi credentials are already in place, the Pi can join the network without any keyboard, monitor, serial console, or “just plug it into Ethernet for the first boot” workaround.&lt;&#x2F;p&gt;
&lt;p&gt;For a small headless device, first boot is not the time to discover that your secret-management chain is technically elegant but operationally fragile.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-first-idea-decrypt-on-the-pi-at-boot&quot;&gt;The first idea: decrypt on the Pi at boot&lt;&#x2F;h2&gt;
&lt;p&gt;The original design was more declarative on paper.&lt;&#x2F;p&gt;
&lt;p&gt;Keep the image generic. Store Wi-Fi credentials in SOPS. Let the Pi decrypt them on first boot. Use the host’s precomputed SSH key as the age identity. Then let the networking stack consume the decrypted output and bring the machine online.&lt;&#x2F;p&gt;
&lt;p&gt;I still think this is an understandable instinct. It matches the way we want the rest of NixOS to work:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;secrets live encrypted in the repo&lt;&#x2F;li&gt;
&lt;li&gt;the machine decrypts what it needs locally&lt;&#x2F;li&gt;
&lt;li&gt;runtime services consume those secrets from known paths&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The problem is that on a fresh, headless Pi, this creates a bootstrap chain with too many timing-sensitive links:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;The host key has to be present.&lt;&#x2F;li&gt;
&lt;li&gt;SOPS has to use that key successfully.&lt;&#x2F;li&gt;
&lt;li&gt;The decrypted Wi-Fi secret has to land where networking expects it.&lt;&#x2F;li&gt;
&lt;li&gt;Networking has to come up correctly on the first try.&lt;&#x2F;li&gt;
&lt;li&gt;Remote access depends on all of the above already working.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That is not impossible. It is just brittle enough that when it fails, recovery is annoying and usually physical.&lt;&#x2F;p&gt;
&lt;p&gt;This is the same family of problem I wrote about in &lt;a href=&quot;&#x2F;blog&#x2F;sops-bootstrap-problem&#x2F;&quot;&gt;Solving the NixOS SOPS Bootstrap Problem&lt;&#x2F;a&gt;, but harsher. A server that fails a bootstrap deploy can often still be reached over SSH or through a provider console. A Raspberry Pi on somebody else’s shelf does not usually give you that luxury.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-actually-failed&quot;&gt;What actually failed&lt;&#x2F;h2&gt;
&lt;p&gt;Several separate issues pushed us away from first-boot decryption and toward provisioning-time injection.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;injecting-into-etc-was-the-wrong-model&quot;&gt;Injecting into &lt;code&gt;&#x2F;etc&lt;&#x2F;code&gt; was the wrong model&lt;&#x2F;h2&gt;
&lt;p&gt;The first mistake was treating &lt;code&gt;&#x2F;etc&lt;&#x2F;code&gt; as a safe place to drop out-of-band bootstrap files into the image.&lt;&#x2F;p&gt;
&lt;p&gt;That works on plenty of conventional Linux setups. On NixOS, it is the wrong mental model. &lt;code&gt;&#x2F;etc&lt;&#x2F;code&gt; is declarative and system-managed. If you manually inject files into it before boot, you are competing with the activation logic that will happily regenerate or replace parts of it from the system configuration.&lt;&#x2F;p&gt;
&lt;p&gt;That gave us two flavors of breakage:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Wi-Fi-related files could disappear before the service that needed them had actually consumed them.&lt;&#x2F;li&gt;
&lt;li&gt;The injected SSH host key could exist on the first boot and then fail to persist the way we intended across later boots.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;This was the turning point in the design. The problem was not only “how do we get the secret there?” It was “what kind of path is appropriate for mutable bootstrap state on NixOS?”&lt;&#x2F;p&gt;
&lt;p&gt;The answer is not &lt;code&gt;&#x2F;etc&lt;&#x2F;code&gt;. It is &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;...&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;sops-at-boot-was-too-timing-sensitive&quot;&gt;SOPS-at-boot was too timing-sensitive&lt;&#x2F;h2&gt;
&lt;p&gt;The next problem was the decryption chain itself.&lt;&#x2F;p&gt;
&lt;p&gt;On a running NixOS machine, SOPS-managed secrets are great. On a headless first boot where networking depends on the secret being available immediately, they become part of the boot-critical path. That means every dependency matters:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the key material has to be present already&lt;&#x2F;li&gt;
&lt;li&gt;the decryption step has to run at the right time&lt;&#x2F;li&gt;
&lt;li&gt;the consumer has to wait for the output&lt;&#x2F;li&gt;
&lt;li&gt;the output path has to survive activation&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;If any part of that chain is wrong, the Pi does not partially succeed. It just never joins the network. That is exactly the sort of failure mode you want to design out of a headless device.&lt;&#x2F;p&gt;
&lt;p&gt;In other words, the issue was not that SOPS is bad. The issue was asking first-boot decryption to solve a bootstrap problem that is easier to solve one step earlier, on the provisioning machine.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wpa-supplicant-made-host-local-bootstrap-harder-than-it-needed-to-be&quot;&gt;&lt;code&gt;wpa_supplicant&lt;&#x2F;code&gt; made host-local bootstrap harder than it needed to be&lt;&#x2F;h2&gt;
&lt;p&gt;The earliest iterations were more &lt;code&gt;wpa_supplicant&lt;&#x2F;code&gt;-centric, using host-local wireless configuration paths. That worked in the sense that it was possible to make it go, but it pushed more host-specific logic into the image build than I wanted, and it was not especially pleasant to reason about.&lt;&#x2F;p&gt;
&lt;p&gt;We eventually moved the shared Raspberry Pi path to &lt;code&gt;NetworkManager&lt;&#x2F;code&gt;, which made the image behavior more uniform and the bootstrap layout simpler. That let us inject one runtime file:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;networkmanager&amp;#x2F;system-connections.env
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;and have the image consume it from a persistent runtime location instead of trying to smuggle mutable first-boot state through a declarative &lt;code&gt;&#x2F;etc&lt;&#x2F;code&gt; path.&lt;&#x2F;p&gt;
&lt;p&gt;This is not an argument that &lt;code&gt;wpa_supplicant&lt;&#x2F;code&gt; is wrong in general. It is an argument that, in this setup, &lt;code&gt;NetworkManager&lt;&#x2F;code&gt; was easier to make boring. And boring is exactly what you want from bootstrap networking.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-provisioner-initially-lied-to-us&quot;&gt;The provisioner initially lied to us&lt;&#x2F;h2&gt;
&lt;p&gt;This one was especially sneaky.&lt;&#x2F;p&gt;
&lt;p&gt;An early version of the image wrapper reported success even when the &lt;code&gt;debugfs&lt;&#x2F;code&gt; writes had failed, because the directories inside the ext4 image did not exist yet. So the script looked happy, the image looked provisioned, and the Pi still came up missing the files we thought we had injected.&lt;&#x2F;p&gt;
&lt;p&gt;That is the worst kind of automation bug: a false positive.&lt;&#x2F;p&gt;
&lt;p&gt;We fixed it by making the provisioner much less trusting:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;create the destination directories inside the image explicitly&lt;&#x2F;li&gt;
&lt;li&gt;write the files only after those directories exist&lt;&#x2F;li&gt;
&lt;li&gt;assert that the injected files are present afterward&lt;&#x2F;li&gt;
&lt;li&gt;verify the finished image contents directly&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Once the script started failing hard instead of cheerfully pretending, the whole flow became much easier to trust.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-design-we-kept&quot;&gt;The design we kept&lt;&#x2F;h2&gt;
&lt;p&gt;The reliable version of the workflow is much simpler:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Keep the source secrets in SOPS in the repo.&lt;&#x2F;li&gt;
&lt;li&gt;Decrypt them on the provisioning machine, where the age identity already exists.&lt;&#x2F;li&gt;
&lt;li&gt;Build the host’s &lt;code&gt;sdImage&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Inject the runtime files directly into the image before first boot.&lt;&#x2F;li&gt;
&lt;li&gt;Store those files under persistent runtime paths in &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;...&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Point services at those paths directly.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;For this host, that means:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;OpenSSH uses &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;bootstrap&#x2F;ssh_host_ed25519_key&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;NetworkManager reads &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;networkmanager&#x2F;system-connections.env&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That design is slightly less symmetrical than “the machine decrypts everything itself at boot,” but it is much more reliable in the only moment that really matters: the first boot when the device is not yet reachable.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-var-lib-is-the-real-lesson&quot;&gt;Why &lt;code&gt;&#x2F;var&#x2F;lib&lt;&#x2F;code&gt; is the real lesson&lt;&#x2F;h2&gt;
&lt;p&gt;The larger design lesson here is not specific to Raspberry Pis.&lt;&#x2F;p&gt;
&lt;p&gt;If a file is mutable runtime state, or bootstrap state that should survive independently of declarative &lt;code&gt;&#x2F;etc&lt;&#x2F;code&gt; generation, it probably belongs somewhere under &lt;code&gt;&#x2F;var&#x2F;lib&lt;&#x2F;code&gt;. On NixOS, this is not just a filesystem preference. It is a boundary between system-managed declarative configuration and persistent machine-local state.&lt;&#x2F;p&gt;
&lt;p&gt;Once I started looking at the problem through that lens, the right shape became obvious:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;&#x2F;etc&lt;&#x2F;code&gt; is for declarative configuration rendered by the system&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;&#x2F;var&#x2F;lib&lt;&#x2F;code&gt; is for persistent mutable state owned by services or bootstrap tooling&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Trying to force bootstrap secrets into &lt;code&gt;&#x2F;etc&lt;&#x2F;code&gt; was really a category error.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-tradeoff&quot;&gt;The tradeoff&lt;&#x2F;h2&gt;
&lt;p&gt;The compromise is that the Wi-Fi secret is currently treated as day-0 bootstrap state.&lt;&#x2F;p&gt;
&lt;p&gt;A plain remote &lt;code&gt;nixos-rebuild&lt;&#x2F;code&gt; later does not automatically rotate the injected Wi-Fi credential unless we add a second runtime-managed secret update path for post-bootstrap changes. So yes, the current setup gives up some declarative neatness.&lt;&#x2F;p&gt;
&lt;p&gt;I think that is the correct trade.&lt;&#x2F;p&gt;
&lt;p&gt;Wi-Fi PSKs do not change often. First-boot connectivity absolutely has to work. Once the Pi is online and reachable, every other kind of refinement becomes easy again. The hard problem is not long-term secret rotation. The hard problem is getting a tiny headless box to show up reliably the first time.&lt;&#x2F;p&gt;
&lt;p&gt;If the price of that reliability is that bootstrap networking is provisioned one step earlier, on the machine building the image, I will happily pay it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-practical-workflow&quot;&gt;The practical workflow&lt;&#x2F;h2&gt;
&lt;p&gt;For a new managed Raspberry Pi host, the workflow now looks like this:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Add the host and image target.&lt;&#x2F;li&gt;
&lt;li&gt;Put &lt;code&gt;wifi_ssid&lt;&#x2F;code&gt;, &lt;code&gt;wifi_psk&lt;&#x2F;code&gt;, and the precomputed SSH host key in the host’s SOPS file.&lt;&#x2F;li&gt;
&lt;li&gt;Run:&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run .#raspberry-pi-provision-image -- &amp;lt;host&amp;gt; &amp;lt;destination&amp;gt;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;ol start=&quot;4&quot;&gt;
&lt;li&gt;Flash the resulting image.&lt;&#x2F;li&gt;
&lt;li&gt;Boot the Pi.&lt;&#x2F;li&gt;
&lt;li&gt;Verify that it is reachable and exporting what you expect:&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;ssh-keyscan &amp;lt;ip&amp;gt;
upsc -l
curl http:&amp;#x2F;&amp;#x2F;&amp;lt;ip&amp;gt;:9199&amp;#x2F;ups_metrics
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That last verification block is intentionally concrete. When the machine is a small appliance-style host, “can I SSH in?” is necessary but not sufficient. If it is supposed to be handling UPS monitoring or some other edge function, check the real service path immediately while the provisioning details are still fresh in your head.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;closing-thought&quot;&gt;Closing thought&lt;&#x2F;h2&gt;
&lt;p&gt;I started out wanting a more elegant first-boot story. What I ended up wanting more was one that worked every time.&lt;&#x2F;p&gt;
&lt;p&gt;For headless Raspberry Pi provisioning on NixOS, image-time secret injection turned out to be the pragmatic answer. Decrypt on the provisioning machine. Inject only the runtime files needed for first boot. Put them under &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;...&lt;&#x2F;code&gt;, not &lt;code&gt;&#x2F;etc&lt;&#x2F;code&gt;. Let the first boot be boring.&lt;&#x2F;p&gt;
&lt;p&gt;That is not the most purely declarative design. It is, however, the one I would trust before mailing a Pi to another building and hoping it comes back on Wi-Fi.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Monitoring UPS Systems with NUT, Prometheus, and Grafana</title>
        <published>2026-04-08T00:13:34+00:00</published>
        <updated>2026-04-08T00:13:34+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/monitoring-ups-nut-prometheus-grafana/"/>
        <id>https://perlpimp.net/blog/monitoring-ups-nut-prometheus-grafana/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/monitoring-ups-nut-prometheus-grafana/">&lt;p&gt;Power failures are one of those problems that feel theoretical right up until a UPS starts beeping and something important goes dark. At that point you do not want to SSH into a host and manually run &lt;code&gt;upsc&lt;&#x2F;code&gt; just to figure out whether the battery is healthy, how much load the unit is carrying, or whether the mains voltage has been wobbling all afternoon.&lt;&#x2F;p&gt;
&lt;p&gt;I wanted the same thing I want from the rest of my infrastructure: one place to look, one scrape path, and graphs that tell me whether the UPS is quietly doing its job or about to ruin my evening.&lt;&#x2F;p&gt;
&lt;p&gt;The setup here monitors two Eaton units:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Host&lt;&#x2F;th&gt;&lt;th&gt;OS&lt;&#x2F;th&gt;&lt;th&gt;UPS Model&lt;&#x2F;th&gt;&lt;th&gt;Rated Power&lt;&#x2F;th&gt;&lt;th&gt;Role&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;fatty&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;FreeBSD 14.7&lt;&#x2F;td&gt;&lt;td&gt;Eaton Ellipse PRO 1600&lt;&#x2F;td&gt;&lt;td&gt;1600 VA&lt;&#x2F;td&gt;&lt;td&gt;Main server running bhyve VMs&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;chronos&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Arch Linux ARM&lt;&#x2F;td&gt;&lt;td&gt;Eaton 3S 550&lt;&#x2F;td&gt;&lt;td&gt;550 VA&lt;&#x2F;td&gt;&lt;td&gt;Raspberry Pi GPS&#x2F;NTP time server&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;Both UPS units are USB-attached to their local host. NUT talks to the hardware, &lt;code&gt;nut_exporter&lt;&#x2F;code&gt; translates that into Prometheus metrics, Prometheus scrapes it, and Grafana turns it into something you can glance at in five seconds.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-the-stack-fits-together&quot;&gt;How the stack fits together&lt;&#x2F;h2&gt;
&lt;p&gt;The data path is simple:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;UPS --USB--&amp;gt; NUT driver + upsd + upsmon --TCP:3493--&amp;gt; nut_exporter --HTTP:9199&amp;#x2F;ups_metrics--&amp;gt; Prometheus --&amp;gt; Grafana
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;NUT is really three pieces:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;usbhid-ups&lt;&#x2F;code&gt; talks to the UPS over USB HID.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;upsd&lt;&#x2F;code&gt; serves UPS state to clients on TCP port 3493.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;upsmon&lt;&#x2F;code&gt; watches battery state and handles shutdown behavior.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;code&gt;nut_exporter&lt;&#x2F;code&gt; connects to &lt;code&gt;upsd&lt;&#x2F;code&gt;, asks for a defined set of variables, and exposes them as Prometheus metrics. Prometheus does not need to know anything about USB, Eaton, or NUT internals. It just scrapes HTTP.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;nut-configuration&quot;&gt;NUT configuration&lt;&#x2F;h2&gt;
&lt;p&gt;The configuration is almost the same on both hosts. The big FreeBSD wrinkle is pathing: on Linux the config usually lives under &lt;code&gt;&#x2F;etc&#x2F;nut&#x2F;&lt;&#x2F;code&gt;, while on FreeBSD it typically lives under &lt;code&gt;&#x2F;usr&#x2F;local&#x2F;etc&#x2F;nut&#x2F;&lt;&#x2F;code&gt;. The service names differ too: Linux usually splits the units out, while FreeBSD gives you &lt;code&gt;nut&lt;&#x2F;code&gt; and &lt;code&gt;nut_upsmon&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The core device definition is minimal:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;ini&quot; class=&quot;language-ini &quot;&gt;&lt;code class=&quot;language-ini&quot; data-lang=&quot;ini&quot;&gt;# &amp;#x2F;etc&amp;#x2F;nut&amp;#x2F;ups.conf on Linux
# &amp;#x2F;usr&amp;#x2F;local&amp;#x2F;etc&amp;#x2F;nut&amp;#x2F;ups.conf on FreeBSD
[ups]
    driver = usbhid-ups
    port = auto
    desc = &amp;quot;Eaton UPS&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;port = auto&lt;&#x2F;code&gt; is usually enough for a single directly attached unit. The section name matters because that becomes the UPS identifier later, for example &lt;code&gt;upsc ups@localhost&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;Restrict &lt;code&gt;upsd&lt;&#x2F;code&gt; to localhost if the exporter runs on the same machine:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;ini&quot; class=&quot;language-ini &quot;&gt;&lt;code class=&quot;language-ini&quot; data-lang=&quot;ini&quot;&gt;# upsd.conf
LISTEN 127.0.0.1 3493
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And the basic &lt;code&gt;upsmon&lt;&#x2F;code&gt; user and monitor configuration looks like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;ini&quot; class=&quot;language-ini &quot;&gt;&lt;code class=&quot;language-ini&quot; data-lang=&quot;ini&quot;&gt;# upsd.users
[monitor]
    password = secret
    upsmon master
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;pre data-lang=&quot;ini&quot; class=&quot;language-ini &quot;&gt;&lt;code class=&quot;language-ini&quot; data-lang=&quot;ini&quot;&gt;# upsmon.conf
MONITOR ups@localhost 1 monitor secret master
SHUTDOWNCMD &amp;quot;&amp;#x2F;sbin&amp;#x2F;shutdown -h +0&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;For a directly attached UPS on a single host, standalone mode is the right fit:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;ini&quot; class=&quot;language-ini &quot;&gt;&lt;code class=&quot;language-ini&quot; data-lang=&quot;ini&quot;&gt;# nut.conf
MODE=standalone
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That tells NUT to run the driver, &lt;code&gt;upsd&lt;&#x2F;code&gt;, and &lt;code&gt;upsmon&lt;&#x2F;code&gt; together.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;verifying-nut-before-you-add-prometheus&quot;&gt;Verifying NUT before you add Prometheus&lt;&#x2F;h2&gt;
&lt;p&gt;Do not start with Grafana. Start with &lt;code&gt;upsc&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;These are the checks that matter most:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Linux
systemctl status nut-server
systemctl status nut-monitor
systemctl status nut-driver

# FreeBSD
service nut status
service nut_upsmon status

# List UPS names known to upsd
upsc -l

# Dump all variables
upsc ups@localhost

# Spot check the useful ones
upsc ups@localhost battery.charge
upsc ups@localhost ups.status
upsc ups@localhost input.voltage

# Check direct driver state
upsdrvctl status
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If this layer is broken, Prometheus is not the problem.&lt;&#x2F;p&gt;
&lt;p&gt;The common failure modes are the usual ones:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;USB permissions are wrong, so the driver cannot open the device.&lt;&#x2F;li&gt;
&lt;li&gt;The wrong driver is configured in &lt;code&gt;ups.conf&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;The UPS has gone stale and &lt;code&gt;upsc&lt;&#x2F;code&gt; reports &lt;code&gt;Data stale&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;upsd&lt;&#x2F;code&gt; is not listening where you think it is.&lt;&#x2F;li&gt;
&lt;li&gt;FreeBSD-specific config paths or service names got copied from Linux docs without adjustment.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;For Eaton gear, &lt;code&gt;usbhid-ups&lt;&#x2F;code&gt; is usually the right driver. If the box has multiple USB UPS devices, you may need to add &lt;code&gt;serial&lt;&#x2F;code&gt;, &lt;code&gt;vendorid&lt;&#x2F;code&gt;, or &lt;code&gt;productid&lt;&#x2F;code&gt; instead of relying on autodetect.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;exporting-nut-metrics&quot;&gt;Exporting NUT metrics&lt;&#x2F;h2&gt;
&lt;p&gt;The exporter here is &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;DRuggeri&#x2F;nut_exporter&quot;&gt;DRuggeri’s nut_exporter&lt;&#x2F;a&gt;. It is a small Go binary that queries &lt;code&gt;upsd&lt;&#x2F;code&gt; and exposes UPS metrics over HTTP.&lt;&#x2F;p&gt;
&lt;p&gt;The key behavior to remember is that it serves UPS metrics on &lt;code&gt;&#x2F;ups_metrics&lt;&#x2F;code&gt;, not the Prometheus default &lt;code&gt;&#x2F;metrics&lt;&#x2F;code&gt;. Miss that detail and you will spend too long staring at empty scrape targets.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;installing-the-exporter&quot;&gt;Installing the exporter&lt;&#x2F;h3&gt;
&lt;p&gt;Example release install commands:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Linux arm64 (Raspberry Pi)
curl -LO https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;DRuggeri&amp;#x2F;nut_exporter&amp;#x2F;releases&amp;#x2F;download&amp;#x2F;2.5.2&amp;#x2F;nut_exporter-2.5.2.linux-arm64.tar.gz
tar xzf nut_exporter-2.5.2.linux-arm64.tar.gz
sudo cp nut_exporter-2.5.2.linux-arm64&amp;#x2F;nut_exporter &amp;#x2F;usr&amp;#x2F;local&amp;#x2F;bin&amp;#x2F;

# FreeBSD amd64
curl -LO https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;DRuggeri&amp;#x2F;nut_exporter&amp;#x2F;releases&amp;#x2F;download&amp;#x2F;2.5.2&amp;#x2F;nut_exporter-2.5.2.freebsd-amd64.tar.gz
tar xzf nut_exporter-2.5.2.freebsd-amd64.tar.gz
sudo cp nut_exporter-2.5.2.freebsd-amd64&amp;#x2F;nut_exporter &amp;#x2F;usr&amp;#x2F;local&amp;#x2F;bin&amp;#x2F;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;running-it&quot;&gt;Running it&lt;&#x2F;h3&gt;
&lt;p&gt;This is the flag set I care about:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;&amp;#x2F;usr&amp;#x2F;local&amp;#x2F;bin&amp;#x2F;nut_exporter \
  --web.listen-address=:9199 \
  --metrics.namespace=network_ups_tools \
  --nut.vars_enable=battery.charge,battery.voltage,battery.runtime,input.voltage,input.voltage.nominal,output.voltage,output.frequency,ups.load,ups.status,ups.power,ups.power.nominal,ups.realpower,ups.beeper.status
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Those flags do three useful things:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;--web.listen-address=:9199&lt;&#x2F;code&gt; chooses the HTTP port.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;--metrics.namespace=network_ups_tools&lt;&#x2F;code&gt; keeps the metric names conventional.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;--nut.vars_enable=...&lt;&#x2F;code&gt; whitelists the exact NUT variables worth scraping.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Without the variable whitelist, you end up exporting a lot of noise. UPS metrics are already niche enough; there is no need to make the series set bigger than necessary.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;systemd-service-on-linux&quot;&gt;systemd service on Linux&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;ini&quot; class=&quot;language-ini &quot;&gt;&lt;code class=&quot;language-ini&quot; data-lang=&quot;ini&quot;&gt;[Unit]
Description=NUT Exporter
After=nut-server.service

[Service]
User=nobody
ExecStart=&amp;#x2F;usr&amp;#x2F;local&amp;#x2F;bin&amp;#x2F;nut_exporter \
  --web.listen-address=:9199 \
  --metrics.namespace=network_ups_tools \
  --nut.vars_enable=battery.charge,battery.voltage,battery.runtime,input.voltage,input.voltage.nominal,output.voltage,output.frequency,ups.load,ups.status,ups.power,ups.power.nominal,ups.realpower,ups.beeper.status
Restart=always

[Install]
WantedBy=multi-user.target
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;freebsd-startup&quot;&gt;FreeBSD startup&lt;&#x2F;h3&gt;
&lt;p&gt;On FreeBSD I keep it simple in &lt;code&gt;rc.conf&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;sh&quot; class=&quot;language-sh &quot;&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;nut_exporter_enable=&amp;quot;YES&amp;quot;
nut_exporter_flags=&amp;quot;--web.listen-address=:9199 --metrics.namespace=network_ups_tools --nut.vars_enable=battery.charge,battery.voltage,battery.runtime,input.voltage,input.voltage.nominal,output.voltage,output.frequency,ups.load,ups.status,ups.power,ups.power.nominal,ups.realpower,ups.beeper.status&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If you prefer a dedicated rc.d script, that works too. The important part is that the exporter comes up after NUT and points at the same UPS name that &lt;code&gt;upsc&lt;&#x2F;code&gt; uses.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;debugging-the-exporter&quot;&gt;Debugging the exporter&lt;&#x2F;h3&gt;
&lt;p&gt;These checks tell you quickly whether the exporter layer is alive:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;ps aux | grep nut_exporter

curl -s http:&amp;#x2F;&amp;#x2F;localhost:9199&amp;#x2F;ups_metrics | head -30
curl -s http:&amp;#x2F;&amp;#x2F;localhost:9199&amp;#x2F;ups_metrics | grep network_ups_tools_ups_status
curl -s http:&amp;#x2F;&amp;#x2F;localhost:9199&amp;#x2F;ups_metrics | grep network_ups_tools_battery_charge
curl -s http:&amp;#x2F;&amp;#x2F;localhost:9199&amp;#x2F;ups_metrics | grep network_ups_tools_device_info
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And from the monitoring host:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;curl -s http:&amp;#x2F;&amp;#x2F;fatty:9199&amp;#x2F;ups_metrics | grep network_ups_tools_ups_load
curl -s http:&amp;#x2F;&amp;#x2F;chronos:9199&amp;#x2F;ups_metrics | grep network_ups_tools_ups_load
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If the endpoint is empty, either &lt;code&gt;upsd&lt;&#x2F;code&gt; is unreachable or the UPS name is wrong. If the metrics exist but values are stale or zero, NUT itself is usually the real problem.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-metrics-worth-caring-about&quot;&gt;The metrics worth caring about&lt;&#x2F;h2&gt;
&lt;p&gt;The exporter produces a useful mix of identity, status, load, battery, and electrical metrics.&lt;&#x2F;p&gt;
&lt;p&gt;Device identity comes through as labels:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;network_ups_tools_device_info{mfr=&amp;quot;EATON&amp;quot;,model=&amp;quot;Ellipse PRO 1600&amp;quot;,serial=&amp;quot;0&amp;quot;,type=&amp;quot;ups&amp;quot;} 1
network_ups_tools_device_info{mfr=&amp;quot;EATON&amp;quot;,model=&amp;quot;Eaton 3S 550&amp;quot;,serial=&amp;quot;Blank&amp;quot;,type=&amp;quot;ups&amp;quot;} 1
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The most important metric family is the status flags:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;&#x2F;th&gt;&lt;th&gt;Meaning&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_status{flag=&quot;OL&quot;}&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Online, mains power present&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_status{flag=&quot;OB&quot;}&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;On battery&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_status{flag=&quot;LB&quot;}&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Low battery&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_status{flag=&quot;CHRG&quot;}&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Charging&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_status{flag=&quot;DISCHRG&quot;}&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Discharging&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_status{flag=&quot;BOOST&quot;}&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Boosting low input voltage&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_status{flag=&quot;TRIM&quot;}&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Trimming high input voltage&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_status{flag=&quot;RB&quot;}&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Replace battery&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;Under normal conditions, &lt;code&gt;OL=1&lt;&#x2F;code&gt; and everything else is &lt;code&gt;0&lt;&#x2F;code&gt;. During an outage you should see &lt;code&gt;OB=1&lt;&#x2F;code&gt;, &lt;code&gt;OL=0&lt;&#x2F;code&gt;, and usually &lt;code&gt;DISCHRG=1&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The rest are the practical numbers:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;&#x2F;th&gt;&lt;th&gt;Unit&lt;&#x2F;th&gt;&lt;th&gt;Why it matters&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_realpower&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;watts&lt;&#x2F;td&gt;&lt;td&gt;Actual power draw&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_power&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;VA&lt;&#x2F;td&gt;&lt;td&gt;Apparent power&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_power_nominal&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;VA&lt;&#x2F;td&gt;&lt;td&gt;Rated UPS capacity&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_ups_load&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;percent&lt;&#x2F;td&gt;&lt;td&gt;Capacity percentage in use&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_battery_charge&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;percent&lt;&#x2F;td&gt;&lt;td&gt;Battery state of charge&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_battery_runtime&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;seconds&lt;&#x2F;td&gt;&lt;td&gt;Estimated remaining runtime&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_battery_voltage&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;volts&lt;&#x2F;td&gt;&lt;td&gt;Battery voltage&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_input_voltage&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;volts&lt;&#x2F;td&gt;&lt;td&gt;Incoming mains voltage&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_output_voltage&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;volts&lt;&#x2F;td&gt;&lt;td&gt;UPS output voltage&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;network_ups_tools_output_frequency&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;hertz&lt;&#x2F;td&gt;&lt;td&gt;Output frequency&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;A handy derived value for dashboards is capacity utilization:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;promql&quot; class=&quot;language-promql &quot;&gt;&lt;code class=&quot;language-promql&quot; data-lang=&quot;promql&quot;&gt;(network_ups_tools_ups_realpower * 100) &amp;#x2F; network_ups_tools_ups_power_nominal
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is often more honest than the vendor’s own load percentage, especially when you want a quick visual threshold against the unit’s rated capacity.&lt;&#x2F;p&gt;
&lt;p&gt;One cross-model wrinkle: not every UPS exposes the same variables. My Eaton Ellipse PRO 1600 reports things like &lt;code&gt;input.voltage&lt;&#x2F;code&gt;, &lt;code&gt;output.frequency&lt;&#x2F;code&gt;, and &lt;code&gt;ups.power&lt;&#x2F;code&gt;. The smaller Eaton 3S 550 does not expose all of those. That is normal. The exporter simply omits missing variables instead of inventing zeros.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;scraping-it-with-prometheus&quot;&gt;Scraping it with Prometheus&lt;&#x2F;h2&gt;
&lt;p&gt;This is the scrape configuration on the monitoring host:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{ network, ... }:

{
  services.prometheus.scrapeConfigs = [
    {
      job_name = &amp;quot;nut&amp;quot;;
      honor_labels = true;
      metrics_path = &amp;quot;&amp;#x2F;ups_metrics&amp;quot;;
      static_configs = [
        {
          targets = [ &amp;quot;${network.hosts.fatty.ip}:9199&amp;quot; ];
          labels = { instance = &amp;quot;fatty&amp;quot;; };
        }
        {
          targets = [ &amp;quot;${network.hosts.chronos-wired.ip}:9199&amp;quot; ];
          labels = { instance = &amp;quot;chronos&amp;quot;; };
        }
      ];
    }
  ];
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Again, the important line is &lt;code&gt;metrics_path = &quot;&#x2F;ups_metrics&quot;&lt;&#x2F;code&gt;. Everything else is ordinary Prometheus.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;building-the-grafana-dashboard&quot;&gt;Building the Grafana dashboard&lt;&#x2F;h2&gt;
&lt;p&gt;The dashboard I wanted was straightforward:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;status at the top so I can see online vs on-battery immediately&lt;&#x2F;li&gt;
&lt;li&gt;current voltage, load, runtime, and beeper state as stat panels&lt;&#x2F;li&gt;
&lt;li&gt;smoothed time series for power, battery charge, load, runtime, and voltages&lt;&#x2F;li&gt;
&lt;li&gt;a variable for selecting one or more UPS instances&lt;&#x2F;li&gt;
&lt;li&gt;a smoothing interval variable so noisy readings do not turn every graph into static&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The useful template variables are:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;$instance&lt;&#x2F;code&gt;, populated from &lt;code&gt;label_values(network_ups_tools_ups_load, instance)&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;$smoothing_interval&lt;&#x2F;code&gt;, with values like &lt;code&gt;1m&lt;&#x2F;code&gt;, &lt;code&gt;5m&lt;&#x2F;code&gt;, &lt;code&gt;15m&lt;&#x2F;code&gt;, &lt;code&gt;1h&lt;&#x2F;code&gt;, &lt;code&gt;1d&lt;&#x2F;code&gt;, and &lt;code&gt;1w&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;For the time series panels I use &lt;code&gt;avg_over_time(...[$smoothing_interval])&lt;&#x2F;code&gt;. UPS readings are often twitchy enough that raw charts are less helpful than a short moving average.&lt;&#x2F;p&gt;
&lt;p&gt;Here is the full dashboard JSON:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;json&quot; class=&quot;language-json &quot;&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;{
  &amp;quot;annotations&amp;quot;: { &amp;quot;list&amp;quot;: [] },
  &amp;quot;description&amp;quot;: &amp;quot;UPS monitoring via NUT exporter (DRuggeri)&amp;quot;,
  &amp;quot;editable&amp;quot;: true,
  &amp;quot;fiscalYearStartMonth&amp;quot;: 0,
  &amp;quot;graphTooltip&amp;quot;: 1,
  &amp;quot;links&amp;quot;: [],
  &amp;quot;panels&amp;quot;: [
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;mappings&amp;quot;: [
            {
              &amp;quot;options&amp;quot;: {
                &amp;quot;0&amp;quot;: { &amp;quot;text&amp;quot;: &amp;quot;Offline&amp;quot;, &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot; },
                &amp;quot;1&amp;quot;: { &amp;quot;text&amp;quot;: &amp;quot;Online&amp;quot;, &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot; }
              },
              &amp;quot;type&amp;quot;: &amp;quot;value&amp;quot;
            }
          ],
          &amp;quot;thresholds&amp;quot;: {
            &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;,
            &amp;quot;steps&amp;quot;: [
              { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: null },
              { &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: 1 }
            ]
          }
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 6, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 0 },
      &amp;quot;id&amp;quot;: 1,
      &amp;quot;options&amp;quot;: {
        &amp;quot;colorMode&amp;quot;: &amp;quot;value&amp;quot;,
        &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;,
        &amp;quot;justifyMode&amp;quot;: &amp;quot;auto&amp;quot;,
        &amp;quot;orientation&amp;quot;: &amp;quot;auto&amp;quot;,
        &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false },
        &amp;quot;textMode&amp;quot;: &amp;quot;auto&amp;quot;
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;network_ups_tools_ups_status{instance=~\&amp;quot;$instance\&amp;quot;, flag=\&amp;quot;OL\&amp;quot;}&amp;quot;,
          &amp;quot;instant&amp;quot;: true,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Status&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;thresholds&amp;quot;: {
            &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;,
            &amp;quot;steps&amp;quot;: [{ &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: null }]
          },
          &amp;quot;unit&amp;quot;: &amp;quot;volt&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 4, &amp;quot;x&amp;quot;: 6, &amp;quot;y&amp;quot;: 0 },
      &amp;quot;id&amp;quot;: 2,
      &amp;quot;options&amp;quot;: {
        &amp;quot;colorMode&amp;quot;: &amp;quot;value&amp;quot;,
        &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;,
        &amp;quot;justifyMode&amp;quot;: &amp;quot;auto&amp;quot;,
        &amp;quot;orientation&amp;quot;: &amp;quot;auto&amp;quot;,
        &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false },
        &amp;quot;textMode&amp;quot;: &amp;quot;auto&amp;quot;
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;network_ups_tools_input_voltage{instance=~\&amp;quot;$instance\&amp;quot;}&amp;quot;,
          &amp;quot;instant&amp;quot;: true,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Input Voltage&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;decimals&amp;quot;: 0,
          &amp;quot;max&amp;quot;: 100,
          &amp;quot;min&amp;quot;: 0,
          &amp;quot;thresholds&amp;quot;: {
            &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;,
            &amp;quot;steps&amp;quot;: [
              { &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: null },
              { &amp;quot;color&amp;quot;: &amp;quot;#EAB839&amp;quot;, &amp;quot;value&amp;quot;: 50 },
              { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: 75 }
            ]
          },
          &amp;quot;unit&amp;quot;: &amp;quot;percent&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 4, &amp;quot;x&amp;quot;: 10, &amp;quot;y&amp;quot;: 0 },
      &amp;quot;id&amp;quot;: 3,
      &amp;quot;options&amp;quot;: {
        &amp;quot;colorMode&amp;quot;: &amp;quot;value&amp;quot;,
        &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;,
        &amp;quot;justifyMode&amp;quot;: &amp;quot;auto&amp;quot;,
        &amp;quot;orientation&amp;quot;: &amp;quot;auto&amp;quot;,
        &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false },
        &amp;quot;textMode&amp;quot;: &amp;quot;auto&amp;quot;
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;network_ups_tools_ups_load{instance=~\&amp;quot;$instance\&amp;quot;}&amp;quot;,
          &amp;quot;instant&amp;quot;: true,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Load&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;thresholds&amp;quot;: {
            &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;,
            &amp;quot;steps&amp;quot;: [
              { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: null },
              { &amp;quot;color&amp;quot;: &amp;quot;orange&amp;quot;, &amp;quot;value&amp;quot;: 300 },
              { &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: 600 }
            ]
          },
          &amp;quot;unit&amp;quot;: &amp;quot;s&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 4, &amp;quot;x&amp;quot;: 14, &amp;quot;y&amp;quot;: 0 },
      &amp;quot;id&amp;quot;: 4,
      &amp;quot;options&amp;quot;: {
        &amp;quot;colorMode&amp;quot;: &amp;quot;value&amp;quot;,
        &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;,
        &amp;quot;justifyMode&amp;quot;: &amp;quot;auto&amp;quot;,
        &amp;quot;orientation&amp;quot;: &amp;quot;auto&amp;quot;,
        &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false },
        &amp;quot;textMode&amp;quot;: &amp;quot;auto&amp;quot;
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;network_ups_tools_battery_runtime{instance=~\&amp;quot;$instance\&amp;quot;}&amp;quot;,
          &amp;quot;instant&amp;quot;: true,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Runtime&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;mappings&amp;quot;: [
            {
              &amp;quot;options&amp;quot;: {
                &amp;quot;0&amp;quot;: { &amp;quot;text&amp;quot;: &amp;quot;Disabled&amp;quot;, &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot; },
                &amp;quot;1&amp;quot;: { &amp;quot;text&amp;quot;: &amp;quot;Enabled&amp;quot;, &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot; }
              },
              &amp;quot;type&amp;quot;: &amp;quot;value&amp;quot;
            }
          ],
          &amp;quot;thresholds&amp;quot;: {
            &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;,
            &amp;quot;steps&amp;quot;: [
              { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: null },
              { &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: 1 }
            ]
          }
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 4, &amp;quot;w&amp;quot;: 6, &amp;quot;x&amp;quot;: 18, &amp;quot;y&amp;quot;: 0 },
      &amp;quot;id&amp;quot;: 5,
      &amp;quot;options&amp;quot;: {
        &amp;quot;colorMode&amp;quot;: &amp;quot;value&amp;quot;,
        &amp;quot;graphMode&amp;quot;: &amp;quot;none&amp;quot;,
        &amp;quot;justifyMode&amp;quot;: &amp;quot;auto&amp;quot;,
        &amp;quot;orientation&amp;quot;: &amp;quot;auto&amp;quot;,
        &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false },
        &amp;quot;textMode&amp;quot;: &amp;quot;auto&amp;quot;
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;network_ups_tools_ups_beeper_status{instance=~\&amp;quot;$instance\&amp;quot;}&amp;quot;,
          &amp;quot;instant&amp;quot;: true,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Beeper Status&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;stat&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: {
            &amp;quot;axisBorderShow&amp;quot;: false,
            &amp;quot;axisCenteredZero&amp;quot;: false,
            &amp;quot;axisColorMode&amp;quot;: &amp;quot;text&amp;quot;,
            &amp;quot;axisPlacement&amp;quot;: &amp;quot;auto&amp;quot;,
            &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;,
            &amp;quot;fillOpacity&amp;quot;: 10,
            &amp;quot;gradientMode&amp;quot;: &amp;quot;opacity&amp;quot;,
            &amp;quot;lineInterpolation&amp;quot;: &amp;quot;linear&amp;quot;,
            &amp;quot;lineWidth&amp;quot;: 1,
            &amp;quot;pointSize&amp;quot;: 5,
            &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;,
            &amp;quot;spanNulls&amp;quot;: true,
            &amp;quot;stacking&amp;quot;: { &amp;quot;group&amp;quot;: &amp;quot;A&amp;quot;, &amp;quot;mode&amp;quot;: &amp;quot;none&amp;quot; },
            &amp;quot;thresholdsStyle&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;off&amp;quot; }
          },
          &amp;quot;decimals&amp;quot;: 1,
          &amp;quot;unit&amp;quot;: &amp;quot;watt&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 10, &amp;quot;w&amp;quot;: 24, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 4 },
      &amp;quot;id&amp;quot;: 6,
      &amp;quot;options&amp;quot;: {
        &amp;quot;legend&amp;quot;: {
          &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;, &amp;quot;mean&amp;quot;],
          &amp;quot;displayMode&amp;quot;: &amp;quot;table&amp;quot;,
          &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot;,
          &amp;quot;showLegend&amp;quot;: true
        },
        &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; }
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;avg_over_time(network_ups_tools_ups_realpower{instance=~\&amp;quot;$instance\&amp;quot;}[$smoothing_interval])&amp;quot;,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Power Consumption [$smoothing_interval]&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: {
            &amp;quot;axisBorderShow&amp;quot;: false,
            &amp;quot;axisCenteredZero&amp;quot;: false,
            &amp;quot;axisColorMode&amp;quot;: &amp;quot;text&amp;quot;,
            &amp;quot;axisPlacement&amp;quot;: &amp;quot;auto&amp;quot;,
            &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;,
            &amp;quot;fillOpacity&amp;quot;: 10,
            &amp;quot;gradientMode&amp;quot;: &amp;quot;opacity&amp;quot;,
            &amp;quot;lineInterpolation&amp;quot;: &amp;quot;linear&amp;quot;,
            &amp;quot;lineWidth&amp;quot;: 1,
            &amp;quot;pointSize&amp;quot;: 5,
            &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;,
            &amp;quot;spanNulls&amp;quot;: true,
            &amp;quot;stacking&amp;quot;: { &amp;quot;group&amp;quot;: &amp;quot;A&amp;quot;, &amp;quot;mode&amp;quot;: &amp;quot;none&amp;quot; },
            &amp;quot;thresholdsStyle&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;off&amp;quot; }
          },
          &amp;quot;decimals&amp;quot;: 0,
          &amp;quot;max&amp;quot;: 100,
          &amp;quot;min&amp;quot;: 0,
          &amp;quot;unit&amp;quot;: &amp;quot;percent&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 6, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 14 },
      &amp;quot;id&amp;quot;: 7,
      &amp;quot;options&amp;quot;: {
        &amp;quot;legend&amp;quot;: {
          &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;],
          &amp;quot;displayMode&amp;quot;: &amp;quot;list&amp;quot;,
          &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot;,
          &amp;quot;showLegend&amp;quot;: true
        },
        &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; }
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;avg_over_time(network_ups_tools_battery_charge{instance=~\&amp;quot;$instance\&amp;quot;}[$smoothing_interval])&amp;quot;,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Battery Charge [$smoothing_interval]&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;thresholds&amp;quot; },
          &amp;quot;max&amp;quot;: 100,
          &amp;quot;min&amp;quot;: 0,
          &amp;quot;thresholds&amp;quot;: {
            &amp;quot;mode&amp;quot;: &amp;quot;absolute&amp;quot;,
            &amp;quot;steps&amp;quot;: [
              { &amp;quot;color&amp;quot;: &amp;quot;green&amp;quot;, &amp;quot;value&amp;quot;: null },
              { &amp;quot;color&amp;quot;: &amp;quot;red&amp;quot;, &amp;quot;value&amp;quot;: 80 }
            ]
          },
          &amp;quot;unit&amp;quot;: &amp;quot;percent&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 6, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 14 },
      &amp;quot;id&amp;quot;: 8,
      &amp;quot;options&amp;quot;: {
        &amp;quot;minVizHeight&amp;quot;: 75,
        &amp;quot;minVizWidth&amp;quot;: 75,
        &amp;quot;orientation&amp;quot;: &amp;quot;auto&amp;quot;,
        &amp;quot;reduceOptions&amp;quot;: { &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;], &amp;quot;fields&amp;quot;: &amp;quot;&amp;quot;, &amp;quot;values&amp;quot;: false },
        &amp;quot;showThresholdLabels&amp;quot;: false,
        &amp;quot;showThresholdMarkers&amp;quot;: true,
        &amp;quot;sizing&amp;quot;: &amp;quot;auto&amp;quot;
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;(network_ups_tools_ups_realpower{instance=~\&amp;quot;$instance\&amp;quot;} * 100) &amp;#x2F; network_ups_tools_ups_power_nominal{instance=~\&amp;quot;$instance\&amp;quot;}&amp;quot;,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Capacity Utilization&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;gauge&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: {
            &amp;quot;axisBorderShow&amp;quot;: false,
            &amp;quot;axisCenteredZero&amp;quot;: false,
            &amp;quot;axisColorMode&amp;quot;: &amp;quot;text&amp;quot;,
            &amp;quot;axisPlacement&amp;quot;: &amp;quot;auto&amp;quot;,
            &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;,
            &amp;quot;fillOpacity&amp;quot;: 10,
            &amp;quot;gradientMode&amp;quot;: &amp;quot;opacity&amp;quot;,
            &amp;quot;lineInterpolation&amp;quot;: &amp;quot;linear&amp;quot;,
            &amp;quot;lineWidth&amp;quot;: 1,
            &amp;quot;pointSize&amp;quot;: 5,
            &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;,
            &amp;quot;spanNulls&amp;quot;: true,
            &amp;quot;stacking&amp;quot;: { &amp;quot;group&amp;quot;: &amp;quot;A&amp;quot;, &amp;quot;mode&amp;quot;: &amp;quot;none&amp;quot; },
            &amp;quot;thresholdsStyle&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;off&amp;quot; }
          },
          &amp;quot;max&amp;quot;: 100,
          &amp;quot;min&amp;quot;: 0,
          &amp;quot;unit&amp;quot;: &amp;quot;percent&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 6, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 20 },
      &amp;quot;id&amp;quot;: 9,
      &amp;quot;options&amp;quot;: {
        &amp;quot;legend&amp;quot;: {
          &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;],
          &amp;quot;displayMode&amp;quot;: &amp;quot;list&amp;quot;,
          &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot;,
          &amp;quot;showLegend&amp;quot;: true
        },
        &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; }
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;avg_over_time(network_ups_tools_ups_load{instance=~\&amp;quot;$instance\&amp;quot;}[$smoothing_interval])&amp;quot;,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Load [$smoothing_interval]&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: {
            &amp;quot;axisBorderShow&amp;quot;: false,
            &amp;quot;axisCenteredZero&amp;quot;: false,
            &amp;quot;axisColorMode&amp;quot;: &amp;quot;text&amp;quot;,
            &amp;quot;axisPlacement&amp;quot;: &amp;quot;auto&amp;quot;,
            &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;,
            &amp;quot;fillOpacity&amp;quot;: 10,
            &amp;quot;gradientMode&amp;quot;: &amp;quot;opacity&amp;quot;,
            &amp;quot;lineInterpolation&amp;quot;: &amp;quot;linear&amp;quot;,
            &amp;quot;lineWidth&amp;quot;: 1,
            &amp;quot;pointSize&amp;quot;: 5,
            &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;,
            &amp;quot;spanNulls&amp;quot;: true,
            &amp;quot;stacking&amp;quot;: { &amp;quot;group&amp;quot;: &amp;quot;A&amp;quot;, &amp;quot;mode&amp;quot;: &amp;quot;none&amp;quot; },
            &amp;quot;thresholdsStyle&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;off&amp;quot; }
          },
          &amp;quot;unit&amp;quot;: &amp;quot;s&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 6, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 20 },
      &amp;quot;id&amp;quot;: 10,
      &amp;quot;options&amp;quot;: {
        &amp;quot;legend&amp;quot;: {
          &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;, &amp;quot;min&amp;quot;],
          &amp;quot;displayMode&amp;quot;: &amp;quot;list&amp;quot;,
          &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot;,
          &amp;quot;showLegend&amp;quot;: true
        },
        &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; }
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;avg_over_time(network_ups_tools_battery_runtime{instance=~\&amp;quot;$instance\&amp;quot;}[$smoothing_interval])&amp;quot;,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Runtime [$smoothing_interval]&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: {
            &amp;quot;axisBorderShow&amp;quot;: false,
            &amp;quot;axisCenteredZero&amp;quot;: false,
            &amp;quot;axisColorMode&amp;quot;: &amp;quot;text&amp;quot;,
            &amp;quot;axisPlacement&amp;quot;: &amp;quot;auto&amp;quot;,
            &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;,
            &amp;quot;fillOpacity&amp;quot;: 10,
            &amp;quot;gradientMode&amp;quot;: &amp;quot;opacity&amp;quot;,
            &amp;quot;lineInterpolation&amp;quot;: &amp;quot;linear&amp;quot;,
            &amp;quot;lineWidth&amp;quot;: 1,
            &amp;quot;pointSize&amp;quot;: 5,
            &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;,
            &amp;quot;spanNulls&amp;quot;: true,
            &amp;quot;stacking&amp;quot;: { &amp;quot;group&amp;quot;: &amp;quot;A&amp;quot;, &amp;quot;mode&amp;quot;: &amp;quot;none&amp;quot; },
            &amp;quot;thresholdsStyle&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;off&amp;quot; }
          },
          &amp;quot;decimals&amp;quot;: 1,
          &amp;quot;unit&amp;quot;: &amp;quot;volt&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 6, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 0, &amp;quot;y&amp;quot;: 26 },
      &amp;quot;id&amp;quot;: 11,
      &amp;quot;options&amp;quot;: {
        &amp;quot;legend&amp;quot;: {
          &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;],
          &amp;quot;displayMode&amp;quot;: &amp;quot;list&amp;quot;,
          &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot;,
          &amp;quot;showLegend&amp;quot;: true
        },
        &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; }
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;avg_over_time(network_ups_tools_input_voltage{instance=~\&amp;quot;$instance\&amp;quot;}[$smoothing_interval])&amp;quot;,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Input Voltage [$smoothing_interval]&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    },
    {
      &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
      &amp;quot;fieldConfig&amp;quot;: {
        &amp;quot;defaults&amp;quot;: {
          &amp;quot;color&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;palette-classic&amp;quot; },
          &amp;quot;custom&amp;quot;: {
            &amp;quot;axisBorderShow&amp;quot;: false,
            &amp;quot;axisCenteredZero&amp;quot;: false,
            &amp;quot;axisColorMode&amp;quot;: &amp;quot;text&amp;quot;,
            &amp;quot;axisPlacement&amp;quot;: &amp;quot;auto&amp;quot;,
            &amp;quot;drawStyle&amp;quot;: &amp;quot;line&amp;quot;,
            &amp;quot;fillOpacity&amp;quot;: 10,
            &amp;quot;gradientMode&amp;quot;: &amp;quot;opacity&amp;quot;,
            &amp;quot;lineInterpolation&amp;quot;: &amp;quot;linear&amp;quot;,
            &amp;quot;lineWidth&amp;quot;: 1,
            &amp;quot;pointSize&amp;quot;: 5,
            &amp;quot;showPoints&amp;quot;: &amp;quot;never&amp;quot;,
            &amp;quot;spanNulls&amp;quot;: true,
            &amp;quot;stacking&amp;quot;: { &amp;quot;group&amp;quot;: &amp;quot;A&amp;quot;, &amp;quot;mode&amp;quot;: &amp;quot;none&amp;quot; },
            &amp;quot;thresholdsStyle&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;off&amp;quot; }
          },
          &amp;quot;decimals&amp;quot;: 1,
          &amp;quot;unit&amp;quot;: &amp;quot;volt&amp;quot;
        },
        &amp;quot;overrides&amp;quot;: []
      },
      &amp;quot;gridPos&amp;quot;: { &amp;quot;h&amp;quot;: 6, &amp;quot;w&amp;quot;: 12, &amp;quot;x&amp;quot;: 12, &amp;quot;y&amp;quot;: 26 },
      &amp;quot;id&amp;quot;: 12,
      &amp;quot;options&amp;quot;: {
        &amp;quot;legend&amp;quot;: {
          &amp;quot;calcs&amp;quot;: [&amp;quot;lastNotNull&amp;quot;],
          &amp;quot;displayMode&amp;quot;: &amp;quot;list&amp;quot;,
          &amp;quot;placement&amp;quot;: &amp;quot;bottom&amp;quot;,
          &amp;quot;showLegend&amp;quot;: true
        },
        &amp;quot;tooltip&amp;quot;: { &amp;quot;mode&amp;quot;: &amp;quot;multi&amp;quot;, &amp;quot;sort&amp;quot;: &amp;quot;none&amp;quot; }
      },
      &amp;quot;targets&amp;quot;: [
        {
          &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
          &amp;quot;expr&amp;quot;: &amp;quot;avg_over_time(network_ups_tools_output_voltage{instance=~\&amp;quot;$instance\&amp;quot;}[$smoothing_interval])&amp;quot;,
          &amp;quot;legendFormat&amp;quot;: &amp;quot;{{instance}}&amp;quot;,
          &amp;quot;refId&amp;quot;: &amp;quot;A&amp;quot;
        }
      ],
      &amp;quot;title&amp;quot;: &amp;quot;Output Voltage [$smoothing_interval]&amp;quot;,
      &amp;quot;type&amp;quot;: &amp;quot;timeseries&amp;quot;
    }
  ],
  &amp;quot;preload&amp;quot;: false,
  &amp;quot;refresh&amp;quot;: &amp;quot;10s&amp;quot;,
  &amp;quot;schemaVersion&amp;quot;: 42,
  &amp;quot;tags&amp;quot;: [],
  &amp;quot;templating&amp;quot;: {
    &amp;quot;list&amp;quot;: [
      {
        &amp;quot;current&amp;quot;: {},
        &amp;quot;includeAll&amp;quot;: false,
        &amp;quot;multi&amp;quot;: false,
        &amp;quot;name&amp;quot;: &amp;quot;datasource&amp;quot;,
        &amp;quot;query&amp;quot;: &amp;quot;prometheus&amp;quot;,
        &amp;quot;refresh&amp;quot;: 1,
        &amp;quot;type&amp;quot;: &amp;quot;datasource&amp;quot;
      },
      {
        &amp;quot;current&amp;quot;: { &amp;quot;text&amp;quot;: [&amp;quot;All&amp;quot;], &amp;quot;value&amp;quot;: [&amp;quot;$__all&amp;quot;] },
        &amp;quot;datasource&amp;quot;: { &amp;quot;type&amp;quot;: &amp;quot;prometheus&amp;quot;, &amp;quot;uid&amp;quot;: &amp;quot;${datasource}&amp;quot; },
        &amp;quot;definition&amp;quot;: &amp;quot;label_values(network_ups_tools_ups_load, instance)&amp;quot;,
        &amp;quot;includeAll&amp;quot;: true,
        &amp;quot;label&amp;quot;: &amp;quot;Instance&amp;quot;,
        &amp;quot;multi&amp;quot;: true,
        &amp;quot;name&amp;quot;: &amp;quot;instance&amp;quot;,
        &amp;quot;query&amp;quot;: { &amp;quot;query&amp;quot;: &amp;quot;label_values(network_ups_tools_ups_load, instance)&amp;quot;, &amp;quot;refId&amp;quot;: &amp;quot;StandardVariableQuery&amp;quot; },
        &amp;quot;refresh&amp;quot;: 1,
        &amp;quot;sort&amp;quot;: 2,
        &amp;quot;type&amp;quot;: &amp;quot;query&amp;quot;
      },
      {
        &amp;quot;current&amp;quot;: { &amp;quot;text&amp;quot;: &amp;quot;15m&amp;quot;, &amp;quot;value&amp;quot;: &amp;quot;15m&amp;quot; },
        &amp;quot;includeAll&amp;quot;: false,
        &amp;quot;label&amp;quot;: &amp;quot;Smoothing&amp;quot;,
        &amp;quot;name&amp;quot;: &amp;quot;smoothing_interval&amp;quot;,
        &amp;quot;options&amp;quot;: [
          { &amp;quot;selected&amp;quot;: false, &amp;quot;text&amp;quot;: &amp;quot;1m&amp;quot;, &amp;quot;value&amp;quot;: &amp;quot;1m&amp;quot; },
          { &amp;quot;selected&amp;quot;: false, &amp;quot;text&amp;quot;: &amp;quot;5m&amp;quot;, &amp;quot;value&amp;quot;: &amp;quot;5m&amp;quot; },
          { &amp;quot;selected&amp;quot;: true, &amp;quot;text&amp;quot;: &amp;quot;15m&amp;quot;, &amp;quot;value&amp;quot;: &amp;quot;15m&amp;quot; },
          { &amp;quot;selected&amp;quot;: false, &amp;quot;text&amp;quot;: &amp;quot;1h&amp;quot;, &amp;quot;value&amp;quot;: &amp;quot;1h&amp;quot; },
          { &amp;quot;selected&amp;quot;: false, &amp;quot;text&amp;quot;: &amp;quot;1d&amp;quot;, &amp;quot;value&amp;quot;: &amp;quot;1d&amp;quot; },
          { &amp;quot;selected&amp;quot;: false, &amp;quot;text&amp;quot;: &amp;quot;1w&amp;quot;, &amp;quot;value&amp;quot;: &amp;quot;1w&amp;quot; }
        ],
        &amp;quot;query&amp;quot;: &amp;quot;1m,5m,15m,1h,1d,1w&amp;quot;,
        &amp;quot;type&amp;quot;: &amp;quot;custom&amp;quot;
      }
    ]
  },
  &amp;quot;time&amp;quot;: { &amp;quot;from&amp;quot;: &amp;quot;now-7d&amp;quot;, &amp;quot;to&amp;quot;: &amp;quot;now&amp;quot; },
  &amp;quot;timepicker&amp;quot;: {},
  &amp;quot;timezone&amp;quot;: &amp;quot;&amp;quot;,
  &amp;quot;title&amp;quot;: &amp;quot;NUT UPS&amp;quot;,
  &amp;quot;uid&amp;quot;: &amp;quot;nut-ups&amp;quot;
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;quick-health-checks&quot;&gt;Quick health checks&lt;&#x2F;h2&gt;
&lt;p&gt;These are the commands I actually want handy when something looks wrong:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# On the UPS host
upsc ups@localhost
upsc ups@localhost ups.status
upsc ups@localhost battery.charge
upsc ups@localhost ups.load
curl -s http:&amp;#x2F;&amp;#x2F;localhost:9199&amp;#x2F;ups_metrics | grep network_ups_tools_ups_status

# From the monitoring host
curl -s http:&amp;#x2F;&amp;#x2F;fatty:9199&amp;#x2F;ups_metrics | grep ups_load
curl -s http:&amp;#x2F;&amp;#x2F;chronos:9199&amp;#x2F;ups_metrics | grep ups_load

# Check for on-battery state
curl -s http:&amp;#x2F;&amp;#x2F;fatty:9199&amp;#x2F;ups_metrics | grep &amp;#x27;flag=&amp;quot;OB&amp;quot;&amp;#x27;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If &lt;code&gt;OB&lt;&#x2F;code&gt; flips to &lt;code&gt;1&lt;&#x2F;code&gt;, you are on battery. If &lt;code&gt;LB&lt;&#x2F;code&gt; joins it, stop admiring the dashboard and start caring about shutdown sequencing.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping up&lt;&#x2F;h2&gt;
&lt;p&gt;This stack is not complicated, but it is full of little edges that are easy to forget six months later: FreeBSD path differences, NUT service naming, the exporter’s &lt;code&gt;&#x2F;ups_metrics&lt;&#x2F;code&gt; path, and the fact that different UPS models expose different variables.&lt;&#x2F;p&gt;
&lt;p&gt;Once those are handled, though, UPS monitoring becomes just another Prometheus job. You get trend data for load, battery, and voltage, quick confirmation that a host is still online, and a much better answer than “the UPS is making noises again.”&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Monitoring Gardena on NixOS with prometheus-gardena-exporter</title>
        <published>2026-04-07T00:10:00+00:00</published>
        <updated>2026-04-07T00:10:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/prometheus-gardena-exporter-nixos/"/>
        <id>https://perlpimp.net/blog/prometheus-gardena-exporter-nixos/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/prometheus-gardena-exporter-nixos/">&lt;p&gt;Your Gardena setup already knows useful things. Soil humidity. Soil temperature. Whether a valve is open. Whether a sensor battery is about to ruin your weekend. The problem is that all of it lives inside a phone app, which is a terrible place for operational data.&lt;&#x2F;p&gt;
&lt;p&gt;If you already run Prometheus and Grafana, that is where this belongs.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;prometheus-gardena-exporter&quot;&gt;prometheus-gardena-exporter&lt;&#x2F;a&gt; is a Rust exporter for the Gardena smart system API. It handles the OAuth client credentials flow, discovers your Gardena location, pulls an initial snapshot, keeps a live WebSocket connection open for realtime updates, periodically reconciles state, and exposes the whole thing as Prometheus metrics. It also ships with a NixOS module, optional local scrape wiring, and a Grafana dashboard, so you do not have to spend your evening building plumbing before you can graph a watering zone.&lt;&#x2F;p&gt;
&lt;p&gt;The useful twist is water usage. Gardena does not give you a real flow meter reading here. The exporter models liters from valve runtime with a default liters-per-minute value, and lets you override that globally or per valve. So no, it is not a meter. But yes, it can still become a pretty useful guesstimate once you tune it to your actual zones.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-the-data-gets-there&quot;&gt;How the data gets there&lt;&#x2F;h2&gt;
&lt;p&gt;The path from Gardena to Grafana looks like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;Gardena API
  -&amp;gt; OAuth token
  -&amp;gt; location snapshot
  -&amp;gt; WebSocket updates
prometheus-gardena-exporter
  -&amp;gt; &amp;#x2F;metrics
Prometheus
  -&amp;gt; queries
Grafana
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;At startup, the exporter authenticates with the Husqvarna auth API using your application key and secret, discovers available locations, picks the configured location or auto-selects the only available one, and fetches a full location snapshot. After that it subscribes to the WebSocket stream and keeps an in-memory view of device and service state current. A periodic snapshot refresh reconciles anything the live stream might have missed.&lt;&#x2F;p&gt;
&lt;p&gt;That design matters because the Gardena API is not something you want to poll on every Prometheus scrape. The exporter absorbs the API weirdness once and presents a normal &lt;code&gt;&#x2F;metrics&lt;&#x2F;code&gt; endpoint to the rest of your monitoring stack.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;getting-it-into-your-flake&quot;&gt;Getting it into your flake&lt;&#x2F;h2&gt;
&lt;p&gt;You have two straightforward ways to consume this on NixOS.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;via-my-ijohanne-nur-packages-flake&quot;&gt;Via my &lt;code&gt;ijohanne&lt;&#x2F;code&gt; NUR packages flake&lt;&#x2F;h3&gt;
&lt;p&gt;If you already pull in my package set, import the module from there:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs = {
    nixpkgs.url = &amp;quot;github:NixOS&amp;#x2F;nixpkgs&amp;#x2F;nixos-25.11&amp;quot;;
    ijohanne-nur.url = &amp;quot;github:ijohanne&amp;#x2F;nur-packages&amp;quot;;
  };

  outputs = { nixpkgs, ijohanne-nur, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      modules = [
        ijohanne-nur.nixosModules.prometheus-gardena-exporter
        .&amp;#x2F;configuration.nix
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;directly-from-the-exporter-flake&quot;&gt;Directly from the exporter flake&lt;&#x2F;h3&gt;
&lt;p&gt;If you want to pin just this exporter and nothing else, pull it straight from the repo:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs = {
    nixpkgs.url = &amp;quot;github:NixOS&amp;#x2F;nixpkgs&amp;#x2F;nixos-25.11&amp;quot;;
    prometheus-gardena-exporter.url = &amp;quot;github:ijohanne&amp;#x2F;prometheus-gardena-exporter&amp;quot;;
  };

  outputs = { nixpkgs, prometheus-gardena-exporter, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      modules = [
        prometheus-gardena-exporter.nixosModules.default
        .&amp;#x2F;configuration.nix
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Same module either way. Pick whichever matches how you already manage third-party flake inputs.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;gardena-application-setup&quot;&gt;Gardena application setup&lt;&#x2F;h2&gt;
&lt;p&gt;Before NixOS gets involved, you need a Gardena application in the Husqvarna developer portal:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Create or open an app at &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;developer.husqvarnagroup.cloud&#x2F;apps&quot;&gt;developer.husqvarnagroup.cloud&#x2F;apps&lt;&#x2F;a&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Add a redirect URL such as &lt;code&gt;http:&#x2F;&#x2F;localhost&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Copy the application key and application secret&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;One detail worth calling out because it is mildly annoying the first time you hit it: the application key is used as both the &lt;code&gt;X-Api-Key&lt;&#x2F;code&gt; header and the OAuth &lt;code&gt;client_id&lt;&#x2F;code&gt;. The application secret is the OAuth &lt;code&gt;client_secret&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;The exporter includes a helper command that prints the raw token request:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run . -- print-token-curl
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That expands to:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;curl -fsSL -X POST -d &amp;quot;grant_type=client_credentials&amp;amp;client_id=$GARDENA_APPLICATION_KEY&amp;amp;client_secret=$GARDENA_APPLICATION_SECRET&amp;quot; \
  https:&amp;#x2F;&amp;#x2F;api.authentication.husqvarnagroup.dev&amp;#x2F;v1&amp;#x2F;oauth2&amp;#x2F;token
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If you want to test the auth flow directly:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;export GARDENA_APPLICATION_KEY=&amp;quot;...&amp;quot;
export GARDENA_APPLICATION_SECRET=&amp;quot;...&amp;quot;

nix run . -- fetch-token --raw
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;To discover available locations:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run . -- list-locations \
  --application-key &amp;quot;$GARDENA_APPLICATION_KEY&amp;quot; \
  --application-secret &amp;quot;$GARDENA_APPLICATION_SECRET&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If the application only has access to one location, the exporter can auto-select it. If it has access to more than one, set &lt;code&gt;locationId&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;nixos-configuration&quot;&gt;NixOS configuration&lt;&#x2F;h2&gt;
&lt;p&gt;Here is the practical NixOS setup:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  services.prometheus-gardena-exporter = {
    enable = true;
    enableLocalScraping = true;
    enableGrafanaDashboard = true;
    applicationKeyFile = &amp;#x2F;run&amp;#x2F;secrets&amp;#x2F;gardena-application-key;
    applicationSecretFile = &amp;#x2F;run&amp;#x2F;secrets&amp;#x2F;gardena-application-secret;
    locationId = &amp;quot;db789fe8-2af2-4eaf-a0ef-8ad795617971&amp;quot;;
    estimatedFlowLitersPerMinute = 3.5;
    estimatedFlowLitersPerMinuteByValve = {
      &amp;quot;5f7a3e6e-1111-2222-3333-444444444444&amp;quot; = 1.2;
      &amp;quot;8c9d0a1b-5555-6666-7777-888888888888&amp;quot; = 6.0;
    };
    validateAuthOnStartup = true;
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Three options do most of the quality-of-life work here:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;enableLocalScraping = true;&lt;&#x2F;code&gt; adds a Prometheus scrape target for you&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;enableGrafanaDashboard = true;&lt;&#x2F;code&gt; provisions the included Gardena dashboard automatically&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;validateAuthOnStartup = true;&lt;&#x2F;code&gt; fails fast if the app key or secret is wrong instead of looking “healthy” while doing nothing useful&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;There is also a nice Nix-specific detail in the module implementation: the service uses &lt;code&gt;LoadCredential&lt;&#x2F;code&gt;, so the application key and secret are read from files at runtime and never written into the Nix store.&lt;&#x2F;p&gt;
&lt;p&gt;By default the exporter listens on &lt;code&gt;127.0.0.1:9134&lt;&#x2F;code&gt;, which is exactly what I want for a local Prometheus scrape.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;finding-the-valve-ids-you-actually-need&quot;&gt;Finding the valve IDs you actually need&lt;&#x2F;h2&gt;
&lt;p&gt;Per-valve flow overrides are keyed by Gardena &lt;code&gt;service_id&lt;&#x2F;code&gt;, not by the friendly zone names from the app. That sounds annoying until you realize the exporter already has a helper for it:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run . -- list-valves \
  --application-key &amp;quot;$GARDENA_APPLICATION_KEY&amp;quot; \
  --application-secret &amp;quot;$GARDENA_APPLICATION_SECRET&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Without Nix:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;cargo run -- list-valves \
  --application-key &amp;quot;$GARDENA_APPLICATION_KEY&amp;quot; \
  --application-secret &amp;quot;$GARDENA_APPLICATION_SECRET&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The output is tab-separated and includes:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;location_id	location	device_id	controller_name	service_id	valve_name
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That gives you the stable &lt;code&gt;service_id&lt;&#x2F;code&gt; values you need for &lt;code&gt;estimatedFlowLitersPerMinuteByValve&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;water-usage-is-a-model-not-a-meter&quot;&gt;Water usage is a model, not a meter&lt;&#x2F;h2&gt;
&lt;p&gt;This is the part worth being explicit about.&lt;&#x2F;p&gt;
&lt;p&gt;Gardena cannot tell this exporter how many liters actually flowed through a zone. There is no physical flow sensor data here. What you get instead is a modeled estimate based on valve-open time.&lt;&#x2F;p&gt;
&lt;p&gt;The built-in default is &lt;code&gt;3.5 L&#x2F;min&lt;&#x2F;code&gt;. That number is derived from a rough monthly estimate:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;5 m3&amp;#x2F;month = 5000 L&amp;#x2F;month
45 watering minutes&amp;#x2F;day x 30 days = 1350 watering minutes&amp;#x2F;month
5000 &amp;#x2F; 1350 = 3.7 L&amp;#x2F;min
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The exporter rounds that down slightly and uses &lt;code&gt;3.5 L&#x2F;min&lt;&#x2F;code&gt; as a conservative default.&lt;&#x2F;p&gt;
&lt;p&gt;That default is fine as a starting point. It is also where people stop too early.&lt;&#x2F;p&gt;
&lt;p&gt;If one zone is a drip line and another is a spray-heavy bed, using one shared liters-per-minute value for both is going to produce nonsense-shaped precision. The fix is per-valve overrides:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.prometheus-gardena-exporter = {
  estimatedFlowLitersPerMinute = 3.5;
  estimatedFlowLitersPerMinuteByValve = {
    &amp;quot;5f7a3e6e-1111-2222-3333-444444444444&amp;quot; = 1.2;
    &amp;quot;8c9d0a1b-5555-6666-7777-888888888888&amp;quot; = 6.0;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is where the model gets much more believable. You are still estimating, but now you are estimating with zone-specific assumptions instead of pretending every part of the garden consumes water the same way.&lt;&#x2F;p&gt;
&lt;p&gt;This is most useful when:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;only one valve is active at a time&lt;&#x2F;li&gt;
&lt;li&gt;each zone has a reasonably consistent emitter profile&lt;&#x2F;li&gt;
&lt;li&gt;your water pressure is fairly stable&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Treat the result as a guesstimate with sharp edges, not as a utility-grade reading. It is still good enough to answer real questions like “which zone is responsible for most of this month’s watering” or “did that new schedule double runtime for the thirsty bed again?”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-you-get-in-prometheus-and-grafana&quot;&gt;What you get in Prometheus and Grafana&lt;&#x2F;h2&gt;
&lt;p&gt;The exporter covers the useful things first.&lt;&#x2F;p&gt;
&lt;p&gt;On the exporter-health side, you get metrics like:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;gardena_exporter_connected&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_exporter_last_successful_sync_timestamp_seconds&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_exporter_websocket_reconnects_total&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_exporter_snapshot_refreshes_total&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;On the device side, you get:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;gardena_sensor_soil_humidity_percent&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_sensor_soil_temperature_celsius&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_sensor_ambient_temperature_celsius&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_sensor_light_intensity_lux&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_device_battery_level_percent&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_device_rf_link_level_percent&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;And on the watering side, the interesting metrics are:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;gardena_valve_open&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_valve_duration_seconds&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_valve_estimated_water_liters_total&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_valve_estimated_current_water_flow_liters_per_minute&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_estimated_water_liters_total&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gardena_estimated_current_water_flow_liters_per_minute&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The included Grafana dashboard leans into the practical views:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;exporter connection and active valve count&lt;&#x2F;li&gt;
&lt;li&gt;average soil humidity and low-battery quick checks&lt;&#x2F;li&gt;
&lt;li&gt;soil humidity and soil temperature by zone&lt;&#x2F;li&gt;
&lt;li&gt;ambient temperature and light where the hardware exposes them&lt;&#x2F;li&gt;
&lt;li&gt;valve status tables&lt;&#x2F;li&gt;
&lt;li&gt;selected-range estimated water usage&lt;&#x2F;li&gt;
&lt;li&gt;cumulative estimated water by zone&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;In other words, it starts where you would probably end up after an afternoon of dashboard-building anyway.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;sharp-edges-and-limitations&quot;&gt;Sharp edges and limitations&lt;&#x2F;h2&gt;
&lt;p&gt;The exporter is honest about the shape of the underlying API:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;snapshot endpoints are rate-limited, so the exporter serves cached state and does not poll on every scrape&lt;&#x2F;li&gt;
&lt;li&gt;the WebSocket URL is short-lived, so the exporter requests it and connects immediately&lt;&#x2F;li&gt;
&lt;li&gt;some Gardena sensors only report &lt;code&gt;soilHumidity&lt;&#x2F;code&gt; and &lt;code&gt;soilTemperature&lt;&#x2F;code&gt;; ambient temperature and light are optional&lt;&#x2F;li&gt;
&lt;li&gt;estimated water totals live in memory and reset when the exporter restarts&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;None of those are deal-breakers. They are just things you want to know before you build a monthly watering report and then restart the service halfway through the month.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;garden-telemetry-that-behaves-like-infrastructure&quot;&gt;Garden telemetry that behaves like infrastructure&lt;&#x2F;h2&gt;
&lt;p&gt;That is really the point of this exporter.&lt;&#x2F;p&gt;
&lt;p&gt;Your irrigation system should not be a glossy black box that only exists in a mobile app. It should be queryable, graphable, alertable, and boring in the same way the rest of your stack is boring. Gardena provides the devices. Prometheus and Grafana provide the observability story. This exporter is the bridge between the two.&lt;&#x2F;p&gt;
&lt;p&gt;And the water estimation model is a good example of the overall approach: no fake certainty, no pretending the API provides data it does not. Just a useful, tunable approximation that gets better when you tell it what your valves actually do.&lt;&#x2F;p&gt;
&lt;p&gt;That is the kind of tradeoff I will take every time.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>BorgBackup and rsync.net on NixOS: Declarative Offsite Backups</title>
        <published>2026-04-06T12:00:00+00:00</published>
        <updated>2026-04-06T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/borgbackup-rsync-net-nixos/"/>
        <id>https://perlpimp.net/blog/borgbackup-rsync-net-nixos/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/borgbackup-rsync-net-nixos/">&lt;p&gt;If you want offsite backups on NixOS without building a whole backup platform around yourself, Borg plus rsync.net is a very good stopping point.&lt;&#x2F;p&gt;
&lt;p&gt;Borg gives you deduplication, compression, authenticated encryption, and a backup format that is pleasant to inspect and restore from. rsync.net gives you a plain SSH-accessible storage account on top of ZFS, which means you are not learning a proprietary API or shoving tarballs into an object store and hoping future-you still remembers the quirks.&lt;&#x2F;p&gt;
&lt;p&gt;The part that makes this especially nice on NixOS is &lt;code&gt;services.borgbackup.jobs&lt;&#x2F;code&gt;. You can start with the raw CLI until the workflow makes sense, then move the whole thing into declarative config with timers, secrets, retention, and systemd ordering.&lt;&#x2F;p&gt;
&lt;p&gt;This post goes in that order: first the five Borg commands you actually need to know, then the NixOS module, then the operational bits that matter in real life.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;install-borg-on-nixos&quot;&gt;Install Borg on NixOS&lt;&#x2F;h2&gt;
&lt;p&gt;If you want to run manual restore and verification commands on the machine itself, install Borg explicitly:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;environment.systemPackages = [ pkgs.borgbackup ];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If you are only using the NixOS module, the backup job will bring Borg along for the service anyway. I still like having the CLI available locally because a backup you cannot inspect manually is not a backup you really trust yet.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;rsync-net-setup-ssh-key-and-remote-borg-path&quot;&gt;rsync.net setup: SSH key and remote Borg path&lt;&#x2F;h2&gt;
&lt;p&gt;Before touching NixOS config, make sure the boring pieces work:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Generate an SSH key dedicated to this backup target.&lt;&#x2F;li&gt;
&lt;li&gt;Add the public key to your rsync.net account.&lt;&#x2F;li&gt;
&lt;li&gt;Verify that you can log in and run the Borg binary rsync.net exposes.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;For current rsync.net setups, the important quirk is usually not a long hardcoded path like &lt;code&gt;&#x2F;usr&#x2F;local&#x2F;bin&#x2F;borg1&#x2F;borg1&lt;&#x2F;code&gt;. Their current docs expose versioned commands such as &lt;code&gt;borg14&lt;&#x2F;code&gt;, so test that directly:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;ssh -i &amp;#x2F;run&amp;#x2F;secrets&amp;#x2F;backup_ssh_key 12345@usw-s001.rsync.net borg14 --version
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If that works, you can tell Borg to use that remote binary via &lt;code&gt;BORG_REMOTE_PATH=borg14&lt;&#x2F;code&gt;. If rsync.net changes their preferred version later, follow their current docs instead of copying an old path from a blog post.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-borg-crash-course&quot;&gt;The Borg crash course&lt;&#x2F;h2&gt;
&lt;p&gt;You do not need to memorize all of Borg. For a normal backup workflow, you mostly care about &lt;code&gt;init&lt;&#x2F;code&gt;, &lt;code&gt;create&lt;&#x2F;code&gt;, &lt;code&gt;list&lt;&#x2F;code&gt;, &lt;code&gt;extract&lt;&#x2F;code&gt;, and &lt;code&gt;prune&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;I like setting the repository and SSH options up front:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;export REPO=&amp;quot;12345@usw-s001.rsync.net:backups&amp;#x2F;my-stuff&amp;quot;
export BORG_RSH=&amp;#x27;ssh -i &amp;#x2F;run&amp;#x2F;secrets&amp;#x2F;backup_ssh_key&amp;#x27;
export BORG_REMOTE_PATH=&amp;#x27;borg14&amp;#x27;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;initialize-the-repository&quot;&gt;Initialize the repository&lt;&#x2F;h3&gt;
&lt;p&gt;This creates the remote repository metadata and chooses the encryption mode:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;borg init --encryption=repokey-blake2 &amp;quot;$REPO&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;For rsync.net, &lt;code&gt;repokey-blake2&lt;&#x2F;code&gt; is the sensible default. Your data is encrypted before it leaves the machine, and the repository key lives with the repo but is protected by your passphrase. If you pick &lt;code&gt;encryption=none&lt;&#x2F;code&gt;, you are explicitly choosing simplicity over confidentiality.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;create-an-archive&quot;&gt;Create an archive&lt;&#x2F;h3&gt;
&lt;p&gt;This is the actual backup run:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;borg create --stats --progress \
  &amp;quot;$REPO::&amp;#x27;{hostname}-{now}&amp;#x27;&amp;quot; \
  &amp;#x2F;etc \
  &amp;#x2F;var&amp;#x2F;lib&amp;#x2F;myapp
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Every &lt;code&gt;create&lt;&#x2F;code&gt; adds a new archive to the same repository. Because Borg deduplicates chunks across archives, daily backups are not just “tar the whole machine again” with a nicer brand name.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;list-archives&quot;&gt;List archives&lt;&#x2F;h3&gt;
&lt;p&gt;This shows what is actually in the repository:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;borg list &amp;quot;$REPO&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Use this early and often. If you are not checking that archives exist with the names you expect, you are doing backup astrology.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;extract-data&quot;&gt;Extract data&lt;&#x2F;h3&gt;
&lt;p&gt;This is the command that matters when you are tired and something is already on fire:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;mkdir -p &amp;#x2F;tmp&amp;#x2F;borg-restore-test
cd &amp;#x2F;tmp&amp;#x2F;borg-restore-test
borg extract &amp;quot;$REPO::myhost-2026-04-06T02:00:00&amp;quot; etc&amp;#x2F;ssh&amp;#x2F;sshd_config
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;One important Borg gotcha: &lt;code&gt;extract&lt;&#x2F;code&gt; writes into your current working directory. Do not casually run it from &lt;code&gt;&#x2F;&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;prune-old-archives&quot;&gt;Prune old archives&lt;&#x2F;h3&gt;
&lt;p&gt;This is retention:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;borg prune --list --dry-run \
  --glob-archives &amp;#x27;{hostname}-*&amp;#x27; \
  --keep-daily 7 \
  --keep-weekly 4 \
  --keep-monthly 6 \
  &amp;quot;$REPO&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Run the dry run first. Always. &lt;code&gt;prune&lt;&#x2F;code&gt; is how you turn “this looks tidy” into “I deleted the only restore point from two Tuesdays ago.”&lt;&#x2F;p&gt;
&lt;p&gt;Also note that manual pruning and space reclamation are separate steps in Borg. &lt;code&gt;prune&lt;&#x2F;code&gt; decides what to delete; &lt;code&gt;compact&lt;&#x2F;code&gt; reclaims the space afterward.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;moving-to-the-nixos-module&quot;&gt;Moving to the NixOS module&lt;&#x2F;h2&gt;
&lt;p&gt;Once the CLI flow makes sense, &lt;code&gt;services.borgbackup.jobs&lt;&#x2F;code&gt; is the nicer way to live with it.&lt;&#x2F;p&gt;
&lt;p&gt;This is a minimal declarative rsync.net job:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{ config, ... }:

{
  sops.secrets.backup_ssh_key = {
    mode = &amp;quot;0400&amp;quot;;
    owner = &amp;quot;root&amp;quot;;
    group = &amp;quot;root&amp;quot;;
  };

  sops.secrets.borg_passphrase = {
    mode = &amp;quot;0400&amp;quot;;
    owner = &amp;quot;root&amp;quot;;
    group = &amp;quot;root&amp;quot;;
  };

  services.borgbackup.jobs.documents = {
    paths = [ &amp;quot;&amp;#x2F;srv&amp;#x2F;documents&amp;quot; ];
    repo = &amp;quot;12345@usw-s001.rsync.net:backups&amp;#x2F;documents&amp;quot;;
    doInit = true;

    encryption = {
      mode = &amp;quot;repokey-blake2&amp;quot;;
      passCommand = &amp;quot;cat ${config.sops.secrets.borg_passphrase.path}&amp;quot;;
    };

    compression = &amp;quot;auto,zstd&amp;quot;;
    startAt = &amp;quot;daily&amp;quot;;

    environment = {
      BORG_RSH = &amp;quot;ssh -i ${config.sops.secrets.backup_ssh_key.path}&amp;quot;;
      BORG_REMOTE_PATH = &amp;quot;borg14&amp;quot;;
    };

    prune.keep = {
      daily = 7;
      weekly = 4;
      monthly = 6;
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That already gets you most of the value:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;a scheduled backup job&lt;&#x2F;li&gt;
&lt;li&gt;automatic repository initialization on first run&lt;&#x2F;li&gt;
&lt;li&gt;encrypted backups without hardcoding secrets into the Nix store&lt;&#x2F;li&gt;
&lt;li&gt;retention policy in the same place as the backup definition&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;If you prefer to initialize the repository manually first, you can do that and leave &lt;code&gt;doInit = false;&lt;&#x2F;code&gt;. I still like &lt;code&gt;doInit = true;&lt;&#x2F;code&gt; for fresh hosts because it removes one more piece of “remember to do this by hand later.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;encryption-modes-what-to-pick&quot;&gt;Encryption modes: what to pick&lt;&#x2F;h2&gt;
&lt;p&gt;For rsync.net, I would usually choose one of these:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;mode = &quot;repokey-blake2&quot;&lt;&#x2F;code&gt; if you want the normal encrypted-client-side setup&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;mode = &quot;none&quot;&lt;&#x2F;code&gt; only if you intentionally do not want Borg encryption&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The “none” variant is valid, just not my default:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.borgbackup.jobs.media = {
  paths = [ &amp;quot;&amp;#x2F;srv&amp;#x2F;media&amp;quot; ];
  repo = &amp;quot;12345@usw-s001.rsync.net:backups&amp;#x2F;media&amp;quot;;
  encryption.mode = &amp;quot;none&amp;quot;;
  compression = &amp;quot;auto,zstd&amp;quot;;
  startAt = &amp;quot;daily&amp;quot;;
  environment = {
    BORG_RSH = &amp;quot;ssh -i ${config.sops.secrets.backup_ssh_key.path}&amp;quot;;
    BORG_REMOTE_PATH = &amp;quot;borg14&amp;quot;;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If the remote side is outside your trust boundary, use encryption. That is the entire point of Borg being pleasant for offsite backups.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;dump-first-back-up-second&quot;&gt;Dump first, back up second&lt;&#x2F;h2&gt;
&lt;p&gt;Backing up a live database directory directly is how you get a repository full of consistency-shaped lies. The safer pattern is:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;dump the database into a staging directory&lt;&#x2F;li&gt;
&lt;li&gt;make Borg wait for that dump&lt;&#x2F;li&gt;
&lt;li&gt;back up the dump directory&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;For anything non-trivial, I prefer a dedicated oneshot service over jamming everything into &lt;code&gt;preHook&lt;&#x2F;code&gt;. You get clearer logs, a real unit name, and a cleaner failure boundary.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{ config, pkgs, ... }:

{
  systemd.services.pg-dump-for-borg = {
    description = &amp;quot;Create PostgreSQL dump before Borg backup&amp;quot;;
    serviceConfig = {
      Type = &amp;quot;oneshot&amp;quot;;
      User = &amp;quot;postgres&amp;quot;;
    };
    script = &amp;#x27;&amp;#x27;
      ${pkgs.coreutils}&amp;#x2F;bin&amp;#x2F;install -d -m 0700 &amp;#x2F;var&amp;#x2F;backup&amp;#x2F;postgresql
      ${config.services.postgresql.package}&amp;#x2F;bin&amp;#x2F;pg_dumpall \
        --clean \
        --if-exists \
        &amp;gt; &amp;#x2F;var&amp;#x2F;backup&amp;#x2F;postgresql&amp;#x2F;all.sql
    &amp;#x27;&amp;#x27;;
  };

  systemd.services.&amp;quot;borgbackup-job-postgresql&amp;quot; = {
    requires = [ &amp;quot;pg-dump-for-borg.service&amp;quot; ];
    after = [ &amp;quot;pg-dump-for-borg.service&amp;quot; ];
  };

  services.borgbackup.jobs.postgresql = {
    paths = [ &amp;quot;&amp;#x2F;var&amp;#x2F;backup&amp;#x2F;postgresql&amp;quot; ];
    repo = &amp;quot;12345@usw-s001.rsync.net:backups&amp;#x2F;postgresql&amp;quot;;
    doInit = true;

    encryption = {
      mode = &amp;quot;repokey-blake2&amp;quot;;
      passCommand = &amp;quot;cat ${config.sops.secrets.borg_passphrase.path}&amp;quot;;
    };

    compression = &amp;quot;auto,zstd&amp;quot;;
    startAt = &amp;quot;daily&amp;quot;;

    environment = {
      BORG_RSH = &amp;quot;ssh -i ${config.sops.secrets.backup_ssh_key.path}&amp;quot;;
      BORG_REMOTE_PATH = &amp;quot;borg14&amp;quot;;
    };

    prune.keep = {
      daily = 7;
      weekly = 4;
      monthly = 6;
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The key detail is &lt;code&gt;requires&lt;&#x2F;code&gt; plus &lt;code&gt;after&lt;&#x2F;code&gt; on &lt;code&gt;borgbackup-job-postgresql&lt;&#x2F;code&gt;. Starting the Borg job pulls in the dump unit first, and a dump failure stops the backup instead of quietly archiving stale data from yesterday.&lt;&#x2F;p&gt;
&lt;p&gt;For small prep work, &lt;code&gt;preHook&lt;&#x2F;code&gt; is fine. For database dumps, VM snapshots, or anything else you may need to debug later, a dedicated unit usually ages better.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;secrets-with-sops-nix&quot;&gt;Secrets with sops-nix&lt;&#x2F;h2&gt;
&lt;p&gt;This is where &lt;code&gt;sops-nix&lt;&#x2F;code&gt; earns its keep. The two secrets you usually care about are:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the SSH private key used to reach rsync.net&lt;&#x2F;li&gt;
&lt;li&gt;the Borg passphrase&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Keep both decrypted at activation time, not embedded in Nix expressions:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;sops.secrets.backup_ssh_key = {
  mode = &amp;quot;0400&amp;quot;;
  owner = &amp;quot;root&amp;quot;;
  group = &amp;quot;root&amp;quot;;
};

sops.secrets.borg_passphrase = {
  mode = &amp;quot;0400&amp;quot;;
  owner = &amp;quot;root&amp;quot;;
  group = &amp;quot;root&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then reference the generated paths:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;environment = {
  BORG_RSH = &amp;quot;ssh -i ${config.sops.secrets.backup_ssh_key.path}&amp;quot;;
  BORG_REMOTE_PATH = &amp;quot;borg14&amp;quot;;
};

encryption = {
  mode = &amp;quot;repokey-blake2&amp;quot;;
  passCommand = &amp;quot;cat ${config.sops.secrets.borg_passphrase.path}&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That keeps the secret material out of the world-readable Nix store, which is the line you do not want to cross just because “it was only a backup key.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;monitoring-and-verification&quot;&gt;Monitoring and verification&lt;&#x2F;h2&gt;
&lt;p&gt;Do not stop at “the timer exists.”&lt;&#x2F;p&gt;
&lt;p&gt;At minimum, check the service and read the logs:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;systemctl status borgbackup-job-documents.service
journalctl -u borgbackup-job-documents.service -e
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then verify the repository itself:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;export REPO=&amp;quot;12345@usw-s001.rsync.net:backups&amp;#x2F;documents&amp;quot;
export BORG_RSH=&amp;#x27;ssh -i &amp;#x2F;run&amp;#x2F;secrets&amp;#x2F;backup_ssh_key&amp;#x27;
export BORG_REMOTE_PATH=&amp;#x27;borg14&amp;#x27;

borg list &amp;quot;$REPO&amp;quot;
borg info &amp;quot;$REPO&amp;quot;
borg check &amp;quot;$REPO&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And every so often, do the part people skip: restore a file into a throwaway directory and confirm it is actually readable. “The backup job stayed green for six months” is not the same thing as “restore works.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-shape-of-a-good-setup&quot;&gt;The shape of a good setup&lt;&#x2F;h2&gt;
&lt;p&gt;The nice thing about this stack is that it scales from “I just need offsite copies of a few directories” to “I need a clean, repeatable backup story for stateful services” without changing tools halfway through.&lt;&#x2F;p&gt;
&lt;p&gt;The progression is straightforward:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Learn &lt;code&gt;borg init&lt;&#x2F;code&gt;, &lt;code&gt;create&lt;&#x2F;code&gt;, &lt;code&gt;list&lt;&#x2F;code&gt;, &lt;code&gt;extract&lt;&#x2F;code&gt;, and &lt;code&gt;prune&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Confirm rsync.net access with your SSH key and remote Borg path.&lt;&#x2F;li&gt;
&lt;li&gt;Move the workflow into &lt;code&gt;services.borgbackup.jobs&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Put the SSH key and passphrase behind &lt;code&gt;sops-nix&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Make pre-backup state capture explicit with systemd dependencies.&lt;&#x2F;li&gt;
&lt;li&gt;Test restores like you expect your future self to need them.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Borg does not make backups magical. It just makes them boring in the best possible way. On NixOS, that is exactly what you want.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Refactoring Dotfiles into a Dendritic Layout</title>
        <published>2026-04-05T12:00:00+00:00</published>
        <updated>2026-04-05T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/refactoring-dotfiles-dendritic-layout/"/>
        <id>https://perlpimp.net/blog/refactoring-dotfiles-dendritic-layout/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/refactoring-dotfiles-dendritic-layout/">&lt;p&gt;I recently reorganized my dotfiles repo into a Dendritic-style layout. The interesting part was not the rename spree. It was the change in what the repository &lt;em&gt;means&lt;&#x2F;em&gt; when you open it.&lt;&#x2F;p&gt;
&lt;p&gt;Before, the repo was mostly shaped around hosts plus a big &lt;code&gt;configs&#x2F;&lt;&#x2F;code&gt; tree. That worked for a long time. But as the flake grew across NixOS, nix-darwin, Home Manager, custom packages, patches, and a pile of shared service logic, the old structure started making more and more questions annoying to answer.&lt;&#x2F;p&gt;
&lt;p&gt;Where does this behavior belong?&lt;&#x2F;p&gt;
&lt;p&gt;Is this reusable or host-local?&lt;&#x2F;p&gt;
&lt;p&gt;Should a new machine copy that chunk from another host, or import something more general?&lt;&#x2F;p&gt;
&lt;p&gt;Why does this thing live under &lt;code&gt;configs&#x2F;&lt;&#x2F;code&gt; if it is clearly not just “configuration” in the narrow sense?&lt;&#x2F;p&gt;
&lt;p&gt;The Dendritic refactor improved a lot of that. It also introduced some real costs. This post is not “everyone should organize dotfiles this way.” It is a higher-level look at what got clearer, what setting up a new host looks like now, and why the clearer architecture also buys you a certain amount of boilerplate and indirection.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-dendritic-means-here&quot;&gt;What Dendritic means here&lt;&#x2F;h2&gt;
&lt;p&gt;Dendritic is not a framework. It is not a new library. It is a repo organization pattern built around flake-parts-style modules and feature-oriented composition.&lt;&#x2F;p&gt;
&lt;p&gt;The basic idea is that you stop treating your repository primarily as:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;a set of hosts&lt;&#x2F;li&gt;
&lt;li&gt;plus a bag of helper files&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;and start treating it as:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;a set of reusable aspects and services&lt;&#x2F;li&gt;
&lt;li&gt;grouped by whether they are public, private, or inventory data&lt;&#x2F;li&gt;
&lt;li&gt;with hosts acting more like manifests that opt into those pieces&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That distinction matters because it changes the default question from:&lt;&#x2F;p&gt;
&lt;p&gt;“Which host owns this logic?”&lt;&#x2F;p&gt;
&lt;p&gt;to:&lt;&#x2F;p&gt;
&lt;p&gt;“Which feature or aspect should define this logic?”&lt;&#x2F;p&gt;
&lt;p&gt;For a small repo, that can be overkill. For a repo that has crossed into multi-host, multi-platform, partly-public, partly-private territory, that question is often the more useful one.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;old-shape-versus-new-shape&quot;&gt;Old shape versus new shape&lt;&#x2F;h2&gt;
&lt;p&gt;The old repo had a familiar feel:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;.
├── flake.nix
├── hosts&amp;#x2F;
│   ├── goose&amp;#x2F;
│   ├── ij-desktop&amp;#x2F;
│   ├── khosu&amp;#x2F;
│   └── ...
└── configs&amp;#x2F;
    ├── programs&amp;#x2F;
    ├── profiles&amp;#x2F;
    ├── users&amp;#x2F;
    ├── server.nix
    ├── network.nix
    ├── secrets.nix
    └── ...
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is a perfectly reasonable place to start. Hosts are visible. Shared things exist. Nothing feels especially abstract.&lt;&#x2F;p&gt;
&lt;p&gt;The problem is that &lt;code&gt;configs&#x2F;&lt;&#x2F;code&gt; becomes a junk drawer. Languages, user defaults, deployment helpers, shared nix settings, patches, service building blocks, and private inventory all end up adjacent even though they are not really the same kind of thing.&lt;&#x2F;p&gt;
&lt;p&gt;The new layout is more opinionated:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;.
├── flake.nix
├── hosts&amp;#x2F;
│   ├── goose&amp;#x2F;
│   ├── ij-desktop&amp;#x2F;
│   ├── khosu&amp;#x2F;
│   └── ...
└── modules&amp;#x2F;
    ├── community&amp;#x2F;
    │   ├── home&amp;#x2F;
    │   ├── nixos&amp;#x2F;
    │   ├── darwin&amp;#x2F;
    │   ├── lib&amp;#x2F;
    │   ├── packages&amp;#x2F;
    │   └── patches&amp;#x2F;
    └── private&amp;#x2F;
        ├── inventory&amp;#x2F;
        ├── nixos&amp;#x2F;
        ├── darwin&amp;#x2F;
        ├── home&amp;#x2F;
        └── shared&amp;#x2F;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This split does two important things immediately.&lt;&#x2F;p&gt;
&lt;p&gt;First, it separates &lt;em&gt;publicly reusable&lt;&#x2F;em&gt; modules from &lt;em&gt;repo-private&lt;&#x2F;em&gt; composition. Second, it gives inventory data its own home instead of letting it blur into general-purpose modules.&lt;&#x2F;p&gt;
&lt;p&gt;That alone made the repo easier to read.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-got-clearer&quot;&gt;What got clearer&lt;&#x2F;h2&gt;
&lt;p&gt;The biggest gain was not “more reuse,” although there is more reuse. The biggest gain was clarity of intent.&lt;&#x2F;p&gt;
&lt;p&gt;In the new tree, directories and file names communicate purpose more directly:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;modules&#x2F;community&lt;&#x2F;code&gt; means reusable surface area I would feel good exporting&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;modules&#x2F;private&lt;&#x2F;code&gt; means repo-specific composition I want to reuse internally&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;modules&#x2F;private&#x2F;inventory&lt;&#x2F;code&gt; means facts about this fleet, not abstract modules&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is much better than a flat &lt;code&gt;configs&#x2F;&lt;&#x2F;code&gt; namespace where everything looks like just another config file.&lt;&#x2F;p&gt;
&lt;p&gt;It also made several kinds of questions easier to answer.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;easier-question-is-this-public-private-or-inventory&quot;&gt;Easier question: is this public, private, or inventory?&lt;&#x2F;h2&gt;
&lt;p&gt;This used to be fuzzy. A helper might be generic in spirit but buried next to host-specific assumptions. A user registry might live near modules that actually describe behavior. Network facts might be imported like they were just another feature.&lt;&#x2F;p&gt;
&lt;p&gt;Now the repo answers that more directly:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;public reusable composition goes in &lt;code&gt;modules&#x2F;community&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;private reusable composition goes in &lt;code&gt;modules&#x2F;private&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;shared facts about hosts and users go in &lt;code&gt;modules&#x2F;private&#x2F;inventory&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That makes maintenance easier because the placement itself carries architectural meaning.&lt;&#x2F;p&gt;
&lt;p&gt;If a file lands in the wrong layer, it feels wrong immediately.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;easier-question-where-should-a-new-behavior-go&quot;&gt;Easier question: where should a new behavior go?&lt;&#x2F;h2&gt;
&lt;p&gt;Under the old layout, the easiest move was often to open the closest host file and add more logic there.&lt;&#x2F;p&gt;
&lt;p&gt;That works until it doesn’t. Host files slowly become half manifest, half implementation, and then you start copying chunks between them because some other host wants “basically the same thing.”&lt;&#x2F;p&gt;
&lt;p&gt;With the new structure, the default move is more often:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;decide whether the behavior is public, private, or inventory&lt;&#x2F;li&gt;
&lt;li&gt;decide whether it is a service, aspect, profile, shared helper, package, or patch&lt;&#x2F;li&gt;
&lt;li&gt;make the host opt into it&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That is a more structured workflow. It creates a little more friction up front, but it produces cleaner boundaries over time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;easier-question-what-does-this-host-actually-do&quot;&gt;Easier question: what does this host actually do?&lt;&#x2F;h2&gt;
&lt;p&gt;One of the most noticeable improvements is that host files got thinner.&lt;&#x2F;p&gt;
&lt;p&gt;A host like &lt;code&gt;goose&lt;&#x2F;code&gt; now reads more like a statement of composition:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;imports = [
  modules.public.nixos.aspects.serverBase
  (import modules.private.nixos.aspects.managedRemoteHost {
    host = &amp;quot;goose&amp;quot;;
    sopsFile = ..&amp;#x2F;..&amp;#x2F;secrets&amp;#x2F;goose.yaml;
  })
  modules.private.nixos.aspects.gooseServices
  modules.public.nixos.services.smsGatewayClient
  (import modules.public.nixos.services.wanFailover { ... })
  .&amp;#x2F;hardware-configuration.nix
  .&amp;#x2F;networking.nix
];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is a nicer top-level read than a long host file that mixes identity, hardware, service definitions, package choices, firewall details, and helper logic all in one place.&lt;&#x2F;p&gt;
&lt;p&gt;You can look at the host and quickly understand its shape:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;base server behavior&lt;&#x2F;li&gt;
&lt;li&gt;remote-host management behavior&lt;&#x2F;li&gt;
&lt;li&gt;host-specific service bundle&lt;&#x2F;li&gt;
&lt;li&gt;explicit shared services&lt;&#x2F;li&gt;
&lt;li&gt;hardware and networking&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is a good trade. The host becomes easier to scan because implementation has moved somewhere named.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;easier-question-how-do-i-set-up-a-new-host-now&quot;&gt;Easier question: how do I set up a new host now?&lt;&#x2F;h2&gt;
&lt;p&gt;This is where the new layout feels most obviously different.&lt;&#x2F;p&gt;
&lt;p&gt;Before, adding a new host often meant copying an existing host, then editing it until it fit. You could do that fast, but it encouraged silent inheritance by copy-paste. The host would work, but the structure would not necessarily tell you what was reusable and what was accidental.&lt;&#x2F;p&gt;
&lt;p&gt;Now the setup path is more deliberate.&lt;&#x2F;p&gt;
&lt;p&gt;At a high level, adding a host means:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;create the host directory&lt;&#x2F;li&gt;
&lt;li&gt;decide which reusable aspects apply&lt;&#x2F;li&gt;
&lt;li&gt;decide what inventory entries it needs&lt;&#x2F;li&gt;
&lt;li&gt;add only the genuinely host-local configuration in the host file&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That feels a bit more like architecture and a bit less like cloning a previous machine.&lt;&#x2F;p&gt;
&lt;p&gt;For example, a new server host does not need to rediscover how to be “a server” from scratch. It can import a named base aspect. A workstation does not need to inline shared Nix CLI defaults, shared shell behavior, or local flake deployment setup if those already exist as named pieces.&lt;&#x2F;p&gt;
&lt;p&gt;This is the part I like most: new hosts are less about copying a specimen and more about selecting building blocks.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;over-time-some-things-become-much-easier-to-understand&quot;&gt;Over time, some things become much easier to understand&lt;&#x2F;h2&gt;
&lt;p&gt;This is the main payoff.&lt;&#x2F;p&gt;
&lt;p&gt;When the repository grows, understanding is less about reading every host and more about understanding the vocabulary of the repo:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;what an aspect means&lt;&#x2F;li&gt;
&lt;li&gt;what belongs in a service module versus a profile&lt;&#x2F;li&gt;
&lt;li&gt;where inventory lives&lt;&#x2F;li&gt;
&lt;li&gt;which pieces are intended to be shared across NixOS, Darwin, Home Manager, or packages&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Once that vocabulary stabilizes, the codebase gets easier to navigate because concepts have homes.&lt;&#x2F;p&gt;
&lt;p&gt;You stop finding the same kind of logic in three or four different places under slightly different names. You stop wondering whether a host-local tweak is actually the canonical implementation of a broader idea. You stop treating the &lt;code&gt;flake.nix&lt;&#x2F;code&gt; file as a manual import spreadsheet.&lt;&#x2F;p&gt;
&lt;p&gt;That last point matters more than it sounds. A smaller, more manifest-like &lt;code&gt;flake.nix&lt;&#x2F;code&gt; is not just aesthetically nicer. It reduces import churn. It reduces the feeling that every new module requires touching central wiring by hand. And it helps the flake describe the repo instead of micromanaging it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;cross-platform-sharing-got-cleaner-too&quot;&gt;Cross-platform sharing got cleaner too&lt;&#x2F;h2&gt;
&lt;p&gt;One reason the old structure was getting strained is that the repo is not just NixOS hosts.&lt;&#x2F;p&gt;
&lt;p&gt;It spans:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;NixOS&lt;&#x2F;li&gt;
&lt;li&gt;nix-darwin&lt;&#x2F;li&gt;
&lt;li&gt;Home Manager&lt;&#x2F;li&gt;
&lt;li&gt;helper libraries&lt;&#x2F;li&gt;
&lt;li&gt;custom packages&lt;&#x2F;li&gt;
&lt;li&gt;patches&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Those are different configuration classes, but they often want to express the same &lt;em&gt;feature-level&lt;&#x2F;em&gt; concern. A workstation baseline is not only a NixOS thing. A deploy helper is not only a host concern. Shared development ergonomics may affect Home Manager and system modules together.&lt;&#x2F;p&gt;
&lt;p&gt;The Dendritic layout gives those things more coherent homes. You can keep related ideas near each other even when they cut across configuration classes.&lt;&#x2F;p&gt;
&lt;p&gt;That does not eliminate complexity, but it makes the complexity easier to name.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-tradeoff-you-buy-clarity-with-more-structure&quot;&gt;The tradeoff: you buy clarity with more structure&lt;&#x2F;h2&gt;
&lt;p&gt;This is where the cost shows up.&lt;&#x2F;p&gt;
&lt;p&gt;The new layout is clearer in the large, but it is not automatically simpler in the small.&lt;&#x2F;p&gt;
&lt;p&gt;Sometimes the old answer to “where do I put this?” was just “put it in the host.” That is crude, but it is low-friction. The new answer may involve:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;creating a new aspect&lt;&#x2F;li&gt;
&lt;li&gt;giving it a good name&lt;&#x2F;li&gt;
&lt;li&gt;deciding whether it is public or private&lt;&#x2F;li&gt;
&lt;li&gt;importing it in the right place&lt;&#x2F;li&gt;
&lt;li&gt;possibly adding inventory support&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is more boilerplate.&lt;&#x2F;p&gt;
&lt;p&gt;Not absurd boilerplate, but real boilerplate. You are making more small files. You are adding more named composition points. You are spending more effort on repo shape.&lt;&#x2F;p&gt;
&lt;p&gt;For a repo at the right scale, that cost is worth paying. For a smaller repo, it can feel like architecture cosplay.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;indirection-is-the-other-real-cost&quot;&gt;Indirection is the other real cost&lt;&#x2F;h2&gt;
&lt;p&gt;Thin hosts are nice to read, but they move behavior elsewhere. That means tracing behavior can become harder.&lt;&#x2F;p&gt;
&lt;p&gt;If you are new to the repo and you open a host file, you might see clean imports and named aspects, but not the full behavior. To understand what the machine actually does, you have to follow the graph:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;from host&lt;&#x2F;li&gt;
&lt;li&gt;to aspect&lt;&#x2F;li&gt;
&lt;li&gt;to service tree&lt;&#x2F;li&gt;
&lt;li&gt;to shared helpers&lt;&#x2F;li&gt;
&lt;li&gt;maybe to inventory data&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is a better model once you know the repo. It is not always a better first impression.&lt;&#x2F;p&gt;
&lt;p&gt;So the trade is basically this:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;host-centric layout gives you more immediate locality&lt;&#x2F;li&gt;
&lt;li&gt;dendritic layout gives you better long-term semantic organization&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;There is no free lunch there.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;naming-becomes-architecture&quot;&gt;Naming becomes architecture&lt;&#x2F;h2&gt;
&lt;p&gt;This is one of the more interesting side effects.&lt;&#x2F;p&gt;
&lt;p&gt;When the repo is built around aspects and module trees, names matter more. A weak name creates confusion far beyond one file. A good name becomes a durable concept that helps the rest of the repo make sense.&lt;&#x2F;p&gt;
&lt;p&gt;That means the refactor shifts some of the burden from file placement to naming quality.&lt;&#x2F;p&gt;
&lt;p&gt;If an aspect is called something vague like &lt;code&gt;common&lt;&#x2F;code&gt; or &lt;code&gt;defaults&lt;&#x2F;code&gt;, it does not help much. If it is named for a clear responsibility like &lt;code&gt;serverBase&lt;&#x2F;code&gt;, &lt;code&gt;managedRemoteHost&lt;&#x2F;code&gt;, or &lt;code&gt;workstationSecrets&lt;&#x2F;code&gt;, then the host composition starts reading like intent instead of implementation trivia.&lt;&#x2F;p&gt;
&lt;p&gt;This is good architecture pressure, but it is still pressure. You have to keep the vocabulary disciplined.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;newcomers-may-feel-more-friction&quot;&gt;Newcomers may feel more friction&lt;&#x2F;h2&gt;
&lt;p&gt;This structure is not as immediately obvious as “there are some hosts and a configs directory.”&lt;&#x2F;p&gt;
&lt;p&gt;A newcomer now has to learn:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the community&#x2F;private&#x2F;inventory split&lt;&#x2F;li&gt;
&lt;li&gt;what counts as an aspect&lt;&#x2F;li&gt;
&lt;li&gt;where shared behavior is expected to live&lt;&#x2F;li&gt;
&lt;li&gt;which modules are meant to be reused and which are just internal glue&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That is a steeper conceptual ramp.&lt;&#x2F;p&gt;
&lt;p&gt;The payoff is that once they learn it, the repo is more coherent. But the up-front learning cost is real, and I would not pretend otherwise.&lt;&#x2F;p&gt;
&lt;p&gt;Auto-imported module trees make this even more true. They keep the flake smaller and reduce manual wiring, which I like, but they also make the repo feel a little more magical. Explicit imports are noisier, but they are easier to explain in one sentence.&lt;&#x2F;p&gt;
&lt;p&gt;Again: tradeoff, not free win.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;verification-matters-more-during-refactors-like-this&quot;&gt;Verification matters more during refactors like this&lt;&#x2F;h2&gt;
&lt;p&gt;Even though I am not focusing this post on the migration itself, one lesson from the refactor is worth calling out because it connects directly to the structure question.&lt;&#x2F;p&gt;
&lt;p&gt;This style of reorganization makes semantic equivalence easy to &lt;em&gt;intend&lt;&#x2F;em&gt; and easy to get slightly wrong.&lt;&#x2F;p&gt;
&lt;p&gt;Most of the repo kept the same normalized configuration shape after the rewrite, which is exactly what I wanted. But careful comparison still caught a couple of genuine drifts:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ij-desktop&lt;&#x2F;code&gt; briefly lost &lt;code&gt;nix-command flakes&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;shared node-exporter factoring briefly implied port &lt;code&gt;9100&lt;&#x2F;code&gt; in a way that needed tightening back up&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Those are not catastrophic bugs. They are exactly the kind of subtle drift you get when you move behavior into reusable layers.&lt;&#x2F;p&gt;
&lt;p&gt;That does not argue against the layout. It argues for stronger verification when you reorganize into this style. The more you rely on reusable composition, the more you need confidence that the composition still evaluates to what you think it does.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;so-was-it-worth-it&quot;&gt;So was it worth it?&lt;&#x2F;h2&gt;
&lt;p&gt;For this repo, yes.&lt;&#x2F;p&gt;
&lt;p&gt;Not because Dendritic is universally superior. Not because host-centric repos are wrong. Not because more module trees automatically mean better architecture.&lt;&#x2F;p&gt;
&lt;p&gt;It was worth it because this flake had already crossed the point where host-local logic, private inventory, reusable modules, public exports, and cross-platform sharing were all competing inside one old layout. The repo wanted stronger boundaries, and the Dendritic split gave it those.&lt;&#x2F;p&gt;
&lt;p&gt;What I gained was mostly clarity:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;clearer separation between public, private, and inventory concerns&lt;&#x2F;li&gt;
&lt;li&gt;thinner hosts that read more like manifests&lt;&#x2F;li&gt;
&lt;li&gt;easier reuse across NixOS, Darwin, Home Manager, packages, patches, and helpers&lt;&#x2F;li&gt;
&lt;li&gt;better path semantics and less junk-drawer ambiguity&lt;&#x2F;li&gt;
&lt;li&gt;a smaller, more declarative &lt;code&gt;flake.nix&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;What I paid was also real:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;more conceptual overhead&lt;&#x2F;li&gt;
&lt;li&gt;more indirection&lt;&#x2F;li&gt;
&lt;li&gt;more naming burden&lt;&#x2F;li&gt;
&lt;li&gt;more boilerplate around introducing new concepts cleanly&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That feels like the honest summary.&lt;&#x2F;p&gt;
&lt;p&gt;If your repo is still basically one or two machines with a few shared modules, I would not rush into this. A straightforward host-centric tree may still be the right tool. But if your flake is starting to feel like a mixed pile of hosts, helpers, secrets, service trees, and cross-platform behavior, then reorganizing around clearer module layers can buy back a lot of understanding.&lt;&#x2F;p&gt;
&lt;p&gt;Just do not mistake that for simplicity.&lt;&#x2F;p&gt;
&lt;p&gt;It is a trade from one kind of complexity to another. In this case, I think it was the right trade.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Legacy Deployment with Nix Flake Apps and systemd User Services</title>
        <published>2026-04-04T00:07:26+00:00</published>
        <updated>2026-04-04T00:07:26+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/legacy-deployment-nix-flake-apps-systemd-user-services/"/>
        <id>https://perlpimp.net/blog/legacy-deployment-nix-flake-apps-systemd-user-services/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/legacy-deployment-nix-flake-apps-systemd-user-services/">&lt;p&gt;You have a nice NixOS module for production. It builds the service, wires up secrets, configures systemd, maybe nginx too. Then reality intrudes: one box is still Debian, another is some inherited Ubuntu VM, and you still need to deploy the same service there without inventing a second operational universe.&lt;&#x2F;p&gt;
&lt;p&gt;The usual answer is “use Ansible,” or Docker, or a pile of shell scripts in &lt;code&gt;~&#x2F;bin&lt;&#x2F;code&gt; that quietly turn into a homegrown deployment system. But if the service already lives in a flake, you can keep the operational logic there too.&lt;&#x2F;p&gt;
&lt;p&gt;The pattern is simple: use flake &lt;code&gt;apps&lt;&#x2F;code&gt; outputs as your legacy deployment interface. &lt;code&gt;nix run .#setup&lt;&#x2F;code&gt;, &lt;code&gt;nix run .#install&lt;&#x2F;code&gt;, &lt;code&gt;nix run .#build-legacy&lt;&#x2F;code&gt;, &lt;code&gt;nix run .#dump-db&lt;&#x2F;code&gt; and so on. Same repo, same package graph, same binary. NixOS hosts use the module. Non-NixOS hosts use the apps.&lt;&#x2F;p&gt;
&lt;p&gt;This gives you one source of truth for build inputs and a second, lighter orchestration path for machines that are not ready for full NixOS.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-goal&quot;&gt;The goal&lt;&#x2F;h2&gt;
&lt;p&gt;You want the same flake to support two deployment modes:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;NixOS&lt;&#x2F;strong&gt;: declarative module, system service, hardened options, proper secret management&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Legacy Linux&lt;&#x2F;strong&gt;: unprivileged user account, &lt;code&gt;systemd --user&lt;&#x2F;code&gt;, environment files, wrapper scripts, and one-command deploys&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Not identical orchestration. Identical artifact.&lt;&#x2F;p&gt;
&lt;p&gt;That distinction matters. You do &lt;strong&gt;not&lt;&#x2F;strong&gt; want two build systems. You want one build and two ways to run it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;first-make-the-deploy-user-a-trusted-nix-user&quot;&gt;First: make the deploy user a trusted Nix user&lt;&#x2F;h2&gt;
&lt;p&gt;Before anything else, the user running &lt;code&gt;nix build&lt;&#x2F;code&gt; on the deploy target needs permission to pass Nix settings to the daemon. This is especially important if your flake has private GitHub inputs.&lt;&#x2F;p&gt;
&lt;p&gt;On the target host:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;ini&quot; class=&quot;language-ini &quot;&gt;&lt;code class=&quot;language-ini&quot; data-lang=&quot;ini&quot;&gt;# &amp;#x2F;etc&amp;#x2F;nix&amp;#x2F;nix.conf
trusted-users = root deploy-user
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then restart the daemon:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;sudo systemctl restart nix-daemon
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Without this, the deploy user can invoke Nix, but the daemon may ignore user-supplied settings such as access tokens or custom substituters. That turns into mysterious failures when a private flake input suddenly looks “missing.”&lt;&#x2F;p&gt;
&lt;p&gt;If your flake pulls from private GitHub repos, add an access token in the deploy user’s config:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;ini&quot; class=&quot;language-ini &quot;&gt;&lt;code class=&quot;language-ini&quot; data-lang=&quot;ini&quot;&gt;# ~&amp;#x2F;.config&amp;#x2F;nix&amp;#x2F;nix.conf
access-tokens = github.com=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now &lt;code&gt;nix build&lt;&#x2F;code&gt; on that host can fetch private &lt;code&gt;github:&lt;&#x2F;code&gt; inputs directly. On NixOS you’d normally do this declaratively and inject the token from sops-nix or agenix. On a legacy host, this is usually a one-time machine bootstrap step.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;shared-paths-in-the-flake&quot;&gt;Shared paths in the flake&lt;&#x2F;h2&gt;
&lt;p&gt;The apps all share the same path conventions. Put those once near the top of your &lt;code&gt;perSystem&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;let
  projectDir = &amp;quot;$HOME&amp;#x2F;git&amp;#x2F;my-service&amp;quot;;
  binDir = &amp;quot;$HOME&amp;#x2F;bin&amp;#x2F;my-service&amp;quot;;
  aliasPrefix = &amp;quot;my_service&amp;quot;;

  loadEnv = &amp;#x27;&amp;#x27;
    cd &amp;quot;${projectDir}&amp;quot;
    if [ -f .env.production ]; then
      set -a; source .env.production; set +a
    elif [ -f .env.local ]; then
      set -a; source .env.local; set +a
    fi
  &amp;#x27;&amp;#x27;;
in
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is the first useful trick: treat environment loading as a reusable shell fragment, not something each script reimplements badly.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;.env.production&lt;&#x2F;code&gt; wins over &lt;code&gt;.env.local&lt;&#x2F;code&gt;. That gives you one convention for deploy targets and another for local testing. &lt;code&gt;set -a&lt;&#x2F;code&gt; means everything sourced gets exported automatically, so your wrapper scripts and systemd unit see the same variables.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;app-1-setup-for-first-time-validation&quot;&gt;App 1: &lt;code&gt;setup&lt;&#x2F;code&gt; for first-time validation&lt;&#x2F;h2&gt;
&lt;p&gt;Your first deploy should not start with “run a giant script and hope.” Give yourself a small setup app that creates directories and validates the build output:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;apps.setup = flake-utils.lib.mkApp {
  drv = pkgs.writeShellApplication {
    name = &amp;quot;setup&amp;quot;;
    text = &amp;#x27;&amp;#x27;
      ${loadEnv}

      echo &amp;quot;Creating data directory...&amp;quot;
      mkdir -p &amp;quot;${projectDir}&amp;#x2F;.data&amp;quot;

      if [ ! -L &amp;quot;${projectDir}&amp;#x2F;result&amp;quot; ]; then
        echo &amp;quot;Run &amp;#x27;nix build&amp;#x27; first, then re-run setup.&amp;quot;
        exit 1
      fi

      echo &amp;quot;Validating configuration...&amp;quot;
      &amp;quot;${projectDir}&amp;#x2F;result&amp;#x2F;bin&amp;#x2F;my-service&amp;quot; --check-config

      echo &amp;quot;Setup complete.&amp;quot;
    &amp;#x27;&amp;#x27;;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That &lt;code&gt;--check-config&lt;&#x2F;code&gt; flag is optional. The point is to let the service validate itself before you wire it into systemd. If your app doesn’t have a config check command, replace it with whatever proves the environment is sane: a database ping, a migration status command, a dry-run boot.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;app-2-build-legacy-as-the-deploy-button&quot;&gt;App 2: &lt;code&gt;build-legacy&lt;&#x2F;code&gt; as the deploy button&lt;&#x2F;h2&gt;
&lt;p&gt;This is the core of the pattern. You want a single command that stops the old service, builds the current flake, writes the systemd user unit, reloads it, and starts the service again.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;apps.build-legacy = flake-utils.lib.mkApp {
  drv = pkgs.writeShellApplication {
    name = &amp;quot;build-legacy&amp;quot;;
    runtimeInputs = with pkgs; [ systemd ];
    text = &amp;#x27;&amp;#x27;
      set -euo pipefail
      ${loadEnv}

      echo &amp;quot;Stopping service...&amp;quot;
      systemctl --user stop my-service || true

      echo &amp;quot;Building...&amp;quot;
      nix build

      echo &amp;quot;Writing systemd unit...&amp;quot;
      mkdir -p &amp;quot;$HOME&amp;#x2F;.config&amp;#x2F;systemd&amp;#x2F;user&amp;quot;
      cat &amp;gt; &amp;quot;$HOME&amp;#x2F;.config&amp;#x2F;systemd&amp;#x2F;user&amp;#x2F;my-service.service&amp;quot; &amp;lt;&amp;lt;UNIT
      [Unit]
      Description=My Service Daemon
      After=network.target

      [Service]
      Type=exec
      ExecStart=${projectDir}&amp;#x2F;result&amp;#x2F;bin&amp;#x2F;my-service
      Environment=SERVICE_PORT=&amp;#x27;&amp;#x27;${SERVICE_PORT:-8080}
      Environment=SERVICE_BIND_ADDRESS=&amp;#x27;&amp;#x27;${SERVICE_BIND_ADDRESS:-127.0.0.1}
      Environment=SERVICE_DB_PATH=&amp;#x27;&amp;#x27;${SERVICE_DB_PATH:-${projectDir}&amp;#x2F;.data&amp;#x2F;my-service.db}
      EnvironmentFile=-${projectDir}&amp;#x2F;.env.production
      Restart=on-failure
      RestartSec=5

      [Install]
      WantedBy=default.target
      UNIT

      systemctl --user daemon-reload
      systemctl --user enable --now my-service
      echo &amp;quot;Service started.&amp;quot;
    &amp;#x27;&amp;#x27;;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A few design choices here are doing real work:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;systemctl --user&lt;&#x2F;code&gt; keeps the whole thing unprivileged. No root-owned unit, no &lt;code&gt;sudo systemctl restart&lt;&#x2F;code&gt;, no system-level service management for a simple single-user deploy.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;systemctl --user stop ... || true&lt;&#x2F;code&gt; makes first deploy idempotent. There might not be anything running yet.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ExecStart=${projectDir}&#x2F;result&#x2F;bin&#x2F;my-service&lt;&#x2F;code&gt; points at the current &lt;code&gt;nix build&lt;&#x2F;code&gt; symlink, so each deploy naturally flips the unit to the newest build result.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;Environment=&lt;&#x2F;code&gt; lines define safe defaults, while &lt;code&gt;EnvironmentFile=-...&lt;&#x2F;code&gt; lets &lt;code&gt;.env.production&lt;&#x2F;code&gt; override them at runtime.&lt;&#x2F;li&gt;
&lt;li&gt;The &lt;code&gt;-&lt;&#x2F;code&gt; on &lt;code&gt;EnvironmentFile&lt;&#x2F;code&gt; is important. It tells systemd “missing file is fine.”&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That last point keeps first boot and bootstrap paths simple. Your unit should not explode just because the env file is absent.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-generate-the-unit-from-nix-instead-of-committing-it&quot;&gt;Why generate the unit from Nix instead of committing it&lt;&#x2F;h2&gt;
&lt;p&gt;Because the unit is part of the deployment interface, and the deployment interface belongs next to the binary definition.&lt;&#x2F;p&gt;
&lt;p&gt;If you commit &lt;code&gt;deploy&#x2F;my-service.service&lt;&#x2F;code&gt; separately, it will drift. Someone tweaks the binary path, or adds a new env var, or changes restart behavior, and now the Nix package and the checked-in unit file disagree. Generating the unit inside &lt;code&gt;writeShellApplication&lt;&#x2F;code&gt; keeps the operational wrapper versioned with the build logic.&lt;&#x2F;p&gt;
&lt;p&gt;It also makes the unit templated by construction. Paths, ports, binary names, env conventions: they all come from the same Nix values used elsewhere in the flake.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;app-3-install-for-management-scripts-and-aliases&quot;&gt;App 3: &lt;code&gt;install&lt;&#x2F;code&gt; for management scripts and aliases&lt;&#x2F;h2&gt;
&lt;p&gt;Once the service exists, you want a decent operator experience. Not “remember six long commands.” Real commands in &lt;code&gt;~&#x2F;bin&#x2F;my-service&lt;&#x2F;code&gt;, plus shell aliases for fish, zsh, and bash.&lt;&#x2F;p&gt;
&lt;p&gt;The basic shape looks like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;apps.install = flake-utils.lib.mkApp {
  drv = pkgs.writeShellApplication {
    name = &amp;quot;install&amp;quot;;
    text = &amp;#x27;&amp;#x27;
      set -euo pipefail

      INSTALL_DIR=&amp;quot;${binDir}&amp;quot;
      mkdir -p &amp;quot;$INSTALL_DIR&amp;quot;

      cat &amp;gt; &amp;quot;$INSTALL_DIR&amp;#x2F;build&amp;quot; &amp;lt;&amp;lt;&amp;#x27;EOF&amp;#x27;
      #!&amp;#x2F;usr&amp;#x2F;bin&amp;#x2F;env bash
      set -euo pipefail
      cd &amp;quot;${projectDir}&amp;quot;
      git pull --ff-only
      nix run .#build-legacy
      EOF
      chmod +x &amp;quot;$INSTALL_DIR&amp;#x2F;build&amp;quot;

      cat &amp;gt; &amp;quot;$INSTALL_DIR&amp;#x2F;tail-log&amp;quot; &amp;lt;&amp;lt;&amp;#x27;EOF&amp;#x27;
      #!&amp;#x2F;usr&amp;#x2F;bin&amp;#x2F;env bash
      exec journalctl --user -u my-service -f
      EOF
      chmod +x &amp;quot;$INSTALL_DIR&amp;#x2F;tail-log&amp;quot;

      cat &amp;gt; &amp;quot;$INSTALL_DIR&amp;#x2F;bash_setup&amp;quot; &amp;lt;&amp;lt;EOF
      echo &amp;quot;alias ${aliasPrefix}_build=&amp;#x27;${binDir}&amp;#x2F;build&amp;#x27;&amp;quot;
      echo &amp;quot;alias ${aliasPrefix}_tail_log=&amp;#x27;${binDir}&amp;#x2F;tail-log&amp;#x27;&amp;quot;
      EOF

      cat &amp;gt; &amp;quot;$INSTALL_DIR&amp;#x2F;fish_setup&amp;quot; &amp;lt;&amp;lt;EOF
      echo &amp;quot;alias ${aliasPrefix}_build &amp;#x27;${binDir}&amp;#x2F;build&amp;#x27;&amp;quot;
      echo &amp;quot;alias ${aliasPrefix}_tail_log &amp;#x27;${binDir}&amp;#x2F;tail-log&amp;#x27;&amp;quot;
      EOF
    &amp;#x27;&amp;#x27;;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;In a real flake you’d factor that more cleanly, probably with a small helper that writes wrapped scripts and another that emits shell-specific alias syntax. The important part is the architecture:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;install&lt;&#x2F;code&gt; creates a stable operator-facing command set in &lt;code&gt;~&#x2F;bin&#x2F;my-service&#x2F;&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;each command is a tiny wrapper around one task&lt;&#x2F;li&gt;
&lt;li&gt;shell integration is generated, not handwritten in three places&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That generated alias layer is worth more than it looks. It means the deploy user logs in and has &lt;code&gt;my_service_build&lt;&#x2F;code&gt;, &lt;code&gt;my_service_tail_log&lt;&#x2F;code&gt;, &lt;code&gt;my_service_dump_db&lt;&#x2F;code&gt;, &lt;code&gt;my_service_shell&lt;&#x2F;code&gt;, whatever else you provide, with no shell-specific maintenance burden.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;app-4-database-dumps-as-part-of-the-flake&quot;&gt;App 4: database dumps as part of the flake&lt;&#x2F;h2&gt;
&lt;p&gt;If your “legacy deployment workflow” does not include a backup command, it is not a deployment workflow. It is an optimism framework.&lt;&#x2F;p&gt;
&lt;p&gt;Make a &lt;code&gt;dump-db&lt;&#x2F;code&gt; app and pin the toolchain with Nix just like everything else.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;sqlite&quot;&gt;SQLite&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;apps.dump-db = flake-utils.lib.mkApp {
  drv = pkgs.writeShellApplication {
    name = &amp;quot;dump-db&amp;quot;;
    runtimeInputs = with pkgs; [ sqlite bzip2 ];
    text = &amp;#x27;&amp;#x27;
      set -euo pipefail
      ${loadEnv}

      DB_PATH=&amp;quot;&amp;#x27;&amp;#x27;${SERVICE_DB_PATH:-${projectDir}&amp;#x2F;.data&amp;#x2F;my-service.db}&amp;quot;
      OUTPUT=&amp;quot;db-export-$(date +%Y%m%d_%H%M%S).sql.bz2&amp;quot;

      sqlite3 &amp;quot;$DB_PATH&amp;quot; .dump | bzip2 &amp;gt; &amp;quot;$OUTPUT&amp;quot;
      echo &amp;quot;Exported to $OUTPUT&amp;quot;
    &amp;#x27;&amp;#x27;;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;postgresql&quot;&gt;PostgreSQL&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;runtimeInputs = with pkgs; [ postgresql bzip2 ];
text = &amp;#x27;&amp;#x27;
  set -euo pipefail
  ${loadEnv}

  OUTPUT=&amp;quot;db-export-$(date +%Y%m%d_%H%M%S).sql.bz2&amp;quot;
  pg_dump \
    --host=&amp;quot;&amp;#x27;&amp;#x27;${DB_HOST:-localhost}&amp;quot; \
    --port=&amp;quot;&amp;#x27;&amp;#x27;${DB_PORT:-5432}&amp;quot; \
    --username=&amp;quot;&amp;#x27;&amp;#x27;${DB_USER}&amp;quot; \
    --dbname=&amp;quot;&amp;#x27;&amp;#x27;${DB_NAME}&amp;quot; \
    --no-owner --no-acl \
    | bzip2 &amp;gt; &amp;quot;$OUTPUT&amp;quot;
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;mysql-or-mariadb&quot;&gt;MySQL or MariaDB&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;runtimeInputs = with pkgs; [ mariadb bzip2 ];
text = &amp;#x27;&amp;#x27;
  set -euo pipefail
  ${loadEnv}

  OUTPUT=&amp;quot;db-export-$(date +%Y%m%d_%H%M%S).sql.bz2&amp;quot;
  mysqldump \
    --host=&amp;quot;&amp;#x27;&amp;#x27;${DB_HOST:-localhost}&amp;quot; \
    --port=&amp;quot;&amp;#x27;&amp;#x27;${DB_PORT:-3306}&amp;quot; \
    --user=&amp;quot;&amp;#x27;&amp;#x27;${DB_USER}&amp;quot; \
    --single-transaction \
    --routines \
    &amp;quot;&amp;#x27;&amp;#x27;${DB_NAME}&amp;quot; \
    | bzip2 &amp;gt; &amp;quot;$OUTPUT&amp;quot;
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is exactly the kind of thing people leave to tribal knowledge: “oh, just remember the right &lt;code&gt;pg_dump&lt;&#x2F;code&gt; flags.” Put it in the flake instead. Then your deploy workflow, your operator docs, and your actual tooling all agree.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;app-5-repl-access-for-the-running-service&quot;&gt;App 5: REPL access for the running service&lt;&#x2F;h2&gt;
&lt;p&gt;The same pattern works for interactive operational tools.&lt;&#x2F;p&gt;
&lt;p&gt;For an Elixir release:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;replScript = mkScript &amp;quot;iex&amp;quot; &amp;#x27;&amp;#x27;
  ${loadEnv}
  exec &amp;quot;${projectDir}&amp;#x2F;_build&amp;#x2F;prod&amp;#x2F;rel&amp;#x2F;my_app&amp;#x2F;bin&amp;#x2F;my_app&amp;quot; remote
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That connects an IEx shell to the running BEAM node. Same idea for Rails:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;replScript = mkScript &amp;quot;console&amp;quot; &amp;#x27;&amp;#x27;
  ${loadEnv}
  cd &amp;quot;${projectDir}&amp;quot;
  exec bundle exec rails console -e production
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Or Django:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;replScript = mkScript &amp;quot;shell&amp;quot; &amp;#x27;&amp;#x27;
  ${loadEnv}
  cd &amp;quot;${projectDir}&amp;quot;
  exec python manage.py shell
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The trick is not language-specific. Load the environment, move into the right directory, run the service-specific console command. Then wire that script into &lt;code&gt;install&lt;&#x2F;code&gt; and export an alias for it like any other operational command.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-motd-banner-is-worth-it&quot;&gt;The MOTD banner is worth it&lt;&#x2F;h2&gt;
&lt;p&gt;If the deploy target has a dedicated service user, make login useful. A small generated MOTD script gives you a dashboard instead of a blank prompt:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;motdScript = mkScript &amp;quot;motd&amp;quot; &amp;#x27;&amp;#x27;
  printf &amp;#x27;\n&amp;#x27;
  printf &amp;#x27;  \033[1;36m%s\033[0m\n&amp;#x27; &amp;quot;My Service&amp;quot;
  printf &amp;#x27;\n&amp;#x27;
  printf &amp;#x27;  Location:          %s\n&amp;#x27; &amp;quot;${projectDir}&amp;quot;
  printf &amp;#x27;  Environment file:  %s&amp;#x2F;.env.production\n&amp;#x27; &amp;quot;${projectDir}&amp;quot;
  printf &amp;#x27;\n&amp;#x27;
  printf &amp;#x27;  \033[1mAliases:\033[0m\n&amp;#x27;
  printf &amp;#x27;    %-28s %s\n&amp;#x27; &amp;quot;${aliasPrefix}_build&amp;quot;    &amp;quot;stop, pull, build, restart&amp;quot;
  printf &amp;#x27;    %-28s %s\n&amp;#x27; &amp;quot;${aliasPrefix}_tail_log&amp;quot; &amp;quot;stream service logs&amp;quot;
  printf &amp;#x27;    %-28s %s\n&amp;#x27; &amp;quot;${aliasPrefix}_dump_db&amp;quot;  &amp;quot;export database backup&amp;quot;
  printf &amp;#x27;    %-28s %s\n&amp;#x27; &amp;quot;${aliasPrefix}_db_shell&amp;quot; &amp;quot;open database shell&amp;quot;
  printf &amp;#x27;    %-28s %s\n&amp;#x27; &amp;quot;${aliasPrefix}_iex&amp;quot;      &amp;quot;attach remote console&amp;quot;
  printf &amp;#x27;\n&amp;#x27;
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then call it from the service user’s shell init after loading the alias file:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;eval &amp;quot;$(~&amp;#x2F;bin&amp;#x2F;my-service&amp;#x2F;bash_setup)&amp;quot;
~&amp;#x2F;bin&amp;#x2F;my-service&amp;#x2F;motd
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Or in fish:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;fish&quot; class=&quot;language-fish &quot;&gt;&lt;code class=&quot;language-fish&quot; data-lang=&quot;fish&quot;&gt;eval (~&amp;#x2F;bin&amp;#x2F;my-service&amp;#x2F;fish_setup)
~&amp;#x2F;bin&amp;#x2F;my-service&amp;#x2F;motd
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is not just cosmetic. The banner becomes a tiny operational contract: what service this account owns, where it lives, and which commands matter.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-environment-loading-pattern&quot;&gt;The environment loading pattern&lt;&#x2F;h2&gt;
&lt;p&gt;The env-loading fragment from earlier is doing two jobs:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;if [ -f .env.production ]; then
  set -a; source .env.production; set +a
elif [ -f .env.local ]; then
  set -a; source .env.local; set +a
fi
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;First, it gives production and local environments a deterministic priority order.&lt;&#x2F;p&gt;
&lt;p&gt;Second, it keeps every app consistent. &lt;code&gt;setup&lt;&#x2F;code&gt;, &lt;code&gt;build-legacy&lt;&#x2F;code&gt;, &lt;code&gt;dump-db&lt;&#x2F;code&gt;, &lt;code&gt;db-shell&lt;&#x2F;code&gt;, &lt;code&gt;iex&lt;&#x2F;code&gt;, any future admin script: they all see the same environment variables from the same place.&lt;&#x2F;p&gt;
&lt;p&gt;That consistency is the real benefit. Once you have six or eight wrapper commands, the fastest way to create weird behavior is to let each one load config differently.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-contrast-with-nixos&quot;&gt;The contrast with NixOS&lt;&#x2F;h2&gt;
&lt;p&gt;At this point the pattern should be clear: the legacy path is not a compromise build. It is a compromise orchestration model.&lt;&#x2F;p&gt;
&lt;p&gt;Here is the practical comparison:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Aspect&lt;&#x2F;th&gt;&lt;th&gt;Legacy host via flake apps&lt;&#x2F;th&gt;&lt;th&gt;NixOS host via module&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Service manager&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;systemd --user&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;system-level &lt;code&gt;systemd&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Privileges&lt;&#x2F;td&gt;&lt;td&gt;unprivileged service user&lt;&#x2F;td&gt;&lt;td&gt;hardened service user &#x2F; &lt;code&gt;DynamicUser&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Secrets&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;.env.production&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;sops-nix &#x2F; agenix &#x2F; declarative injection&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Updates&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;nix run .#build-legacy&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;nixos-rebuild switch&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Reverse proxy&lt;&#x2F;td&gt;&lt;td&gt;manual nginx or caddy config&lt;&#x2F;td&gt;&lt;td&gt;declarative module options&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Backups&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;nix run .#dump-db&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;timer or service declared in NixOS&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Monitoring&lt;&#x2F;td&gt;&lt;td&gt;manual integration&lt;&#x2F;td&gt;&lt;td&gt;declarative Prometheus&#x2F;Grafana wiring&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;That is exactly the right relationship. NixOS should be the better deployment target. It gives you stronger isolation, stronger secret handling, stronger service definitions. But the legacy path still uses the same flake, same package graph, same env surface, and same operational commands.&lt;&#x2F;p&gt;
&lt;p&gt;That makes migration gradual instead of traumatic.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;daily-workflow-after-install&quot;&gt;Daily workflow after &lt;code&gt;install&lt;&#x2F;code&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Once you have installed the wrappers and sourced the aliases, operating the service should look boring:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;my_service_build
my_service_tail_log
my_service_dump_db
my_service_db_shell
my_service_iex
my_service_shell
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That is the whole point. If the workflow on a non-NixOS box still feels bespoke, you have not pushed enough of it into the flake yet.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-this-pattern-holds-up&quot;&gt;Why this pattern holds up&lt;&#x2F;h2&gt;
&lt;p&gt;The nice part of this approach is not that it is “fully declarative.” It isn’t. A legacy host is still a legacy host. You are still relying on a user account, a checked-out repo, a shell profile, and some one-time machine bootstrap.&lt;&#x2F;p&gt;
&lt;p&gt;The nice part is narrower and more useful:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;your deploy commands are versioned with the code&lt;&#x2F;li&gt;
&lt;li&gt;your build inputs stay pinned&lt;&#x2F;li&gt;
&lt;li&gt;your operational wrappers stay reproducible&lt;&#x2F;li&gt;
&lt;li&gt;your non-NixOS path does not fork into another toolchain&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;You stop treating “legacy deployment” as a totally separate system. It becomes another interface on the same flake.&lt;&#x2F;p&gt;
&lt;p&gt;If you eventually migrate the box to NixOS, great. The service definition gets better, but the build and runtime assumptions stay familiar. If you do not migrate it yet, you still have something coherent: one repo, one flake, one set of commands, two orchestration targets.&lt;&#x2F;p&gt;
&lt;p&gt;That is usually enough to keep the infrastructure sane.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Monitoring gpsd with Prometheus using prometheus-gpsd-exporter</title>
        <published>2026-04-03T00:04:00+00:00</published>
        <updated>2026-04-03T00:04:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/prometheus-gpsd-exporter-nixos/"/>
        <id>https://perlpimp.net/blog/prometheus-gpsd-exporter-nixos/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/prometheus-gpsd-exporter-nixos/">&lt;p&gt;GPS receivers spit out NMEA, gpsd parses it, and getting that data into Prometheus shouldn’t require babysitting a crashy exporter. Yet here we are.&lt;&#x2F;p&gt;
&lt;p&gt;The existing options for gpsd-to-Prometheus are a &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;brendanbank&#x2F;gpsd-prometheus-exporter&quot;&gt;Python exporter&lt;&#x2F;a&gt; and a &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;brendanbank&#x2F;gpsd-prometheus-exporter&#x2F;tree&#x2F;master&quot;&gt;Go exporter&lt;&#x2F;a&gt;. Both have the same fundamental problem: they don’t stay running. The Python one relies on the &lt;code&gt;gps&lt;&#x2F;code&gt; library, which throws &lt;code&gt;StopIteration&lt;&#x2F;code&gt; when the connection hiccups — unhandled, fatal, exporter dead. The Go one polls on a 10-second interval instead of streaming, calls &lt;code&gt;log.Fatal&lt;&#x2F;code&gt; on parse errors, has no reconnect logic, and uses different metric names from the Python exporter — so your Grafana dashboards break if you switch between them.&lt;&#x2F;p&gt;
&lt;p&gt;You end up restarting gpsd, restarting the exporter, and wondering why your satellite count graph has gaps every few hours.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-rust-rewrite-that-stays-up&quot;&gt;A Rust rewrite that stays up&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;prometheus-gpsd-exporter&quot;&gt;prometheus-gpsd-exporter&lt;&#x2F;a&gt; replaces both. It’s a single async Rust binary — Tokio runtime, one persistent TCP connection to gpsd with streaming JSON, one HTTP server on &lt;code&gt;&#x2F;metrics&lt;&#x2F;code&gt;. The architecture is simple:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;gpsd (host A:2947) --TCP&amp;#x2F;JSON--&amp;gt; exporter (host B) --HTTP&amp;#x2F;metrics--&amp;gt; Prometheus
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Direct TCP connection, no fragile library dependency. It reads gpsd’s JSON stream, parses every message type that matters — TPV, SKY, GST, TOFF, PPS, OSC — and exposes them as Prometheus metrics. Unknown or malformed messages get logged and skipped. Never crashes.&lt;&#x2F;p&gt;
&lt;p&gt;When the connection drops, it reconnects with exponential backoff — starting at 10 seconds, capping at 5 minutes. No manual intervention, no systemd restart loops, no gaps in your graphs.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;getting-gpsd-to-cooperate&quot;&gt;Getting gpsd to cooperate&lt;&#x2F;h2&gt;
&lt;p&gt;A solid exporter is half the battle. The other half is getting gpsd to actually emit the data you want. Two things trip people up here: baud rate and network listening.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-baud-rate-problem&quot;&gt;The baud rate problem&lt;&#x2F;h3&gt;
&lt;p&gt;gpsd truncates SKY messages and other verbose output unless you tell it the correct baud rate for your receiver. Most modern GPS modules run at 115200 baud. Without the &lt;code&gt;-s&lt;&#x2F;code&gt; flag, gpsd defaults to 9600 and silently drops data — you get position fixes but no satellite details, no DOP values, no signal strength per satellite.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Traditional Linux (&lt;code&gt;&#x2F;etc&#x2F;default&#x2F;gpsd&lt;&#x2F;code&gt;):&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;GPSD_OPTIONS=&amp;quot;-G -s 115200&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;NixOS:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.gpsd = {
  enable = true;
  devices = [ &amp;quot;&amp;#x2F;dev&amp;#x2F;ttyUSB0&amp;quot; ];
  listenany = true;  # equivalent of -G
  extraArgs = [ &amp;quot;-s&amp;quot; &amp;quot;115200&amp;quot; ];
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;-G&lt;&#x2F;code&gt; flag (or &lt;code&gt;listenany&lt;&#x2F;code&gt; in NixOS) tells gpsd to listen on all interfaces instead of just localhost. You’ll need this if the exporter runs on a different host than the GPS receiver.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;making-gpsd-listen-on-the-network&quot;&gt;Making gpsd listen on the network&lt;&#x2F;h3&gt;
&lt;p&gt;On traditional Linux distributions, gpsd uses systemd socket activation. The default socket only listens on localhost. To expose gpsd over the network, you need to override the socket unit.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Traditional Linux (&lt;code&gt;systemctl edit gpsd.socket&lt;&#x2F;code&gt;):&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;ini&quot; class=&quot;language-ini &quot;&gt;&lt;code class=&quot;language-ini&quot; data-lang=&quot;ini&quot;&gt;[Socket]
ListenStream=
ListenStream=&amp;#x2F;var&amp;#x2F;run&amp;#x2F;gpsd.sock
ListenStream=[::]:2947
ListenStream=0.0.0.0:2947
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The empty &lt;code&gt;ListenStream=&lt;&#x2F;code&gt; clears the defaults first — without it, you’d be adding listeners on top of the existing ones. Then the subsequent lines add back the Unix socket plus all-interface TCP listeners. After editing, restart both units:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;sudo systemctl restart gpsd.socket gpsd.service
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;NixOS:&lt;&#x2F;strong&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The NixOS gpsd module doesn’t use socket activation — it runs a forking service directly. The &lt;code&gt;listenany = true&lt;&#x2F;code&gt; option handles network listening. If you need socket activation with both Unix and TCP sockets for some reason, define it explicitly:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;systemd.sockets.gpsd = {
  description = &amp;quot;GPS Daemon Sockets&amp;quot;;
  wantedBy = [ &amp;quot;sockets.target&amp;quot; ];
  socketConfig = {
    ListenStream = [
      &amp;quot;&amp;#x2F;run&amp;#x2F;gpsd.sock&amp;quot;
      &amp;quot;[::]:2947&amp;quot;
      &amp;quot;0.0.0.0:2947&amp;quot;
    ];
    SocketMode = &amp;quot;0600&amp;quot;;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Most NixOS setups won’t need this — &lt;code&gt;listenany = true&lt;&#x2F;code&gt; is sufficient when the exporter and gpsd are on the same host or when you just need TCP access.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;setting-up-the-exporter&quot;&gt;Setting up the exporter&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;as-a-nix-flake&quot;&gt;As a Nix flake&lt;&#x2F;h3&gt;
&lt;p&gt;Add the exporter as a flake input and import the NixOS module:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs = {
    nixpkgs.url = &amp;quot;github:NixOS&amp;#x2F;nixpkgs&amp;#x2F;nixos-unstable&amp;quot;;
    prometheus-gpsd-exporter.url = &amp;quot;github:ijohanne&amp;#x2F;prometheus-gpsd-exporter&amp;quot;;
  };

  outputs = { nixpkgs, prometheus-gpsd-exporter, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      modules = [
        prometheus-gpsd-exporter.nixosModules.default
        .&amp;#x2F;configuration.nix
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;nixos-module-configuration&quot;&gt;NixOS module configuration&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.prometheus-gpsd-exporter = {
  enable = true;
  enableLocalScraping = true;
  gpsdHost = &amp;quot;192.168.1.100&amp;quot;;
  ppsHistogram = true;
  offsetFromGeopoint = true;
  geopointLat = 51.5074;
  geopointLon = -0.1278;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;enableLocalScraping&lt;&#x2F;code&gt; adds a Prometheus scrape target automatically — no manual &lt;code&gt;scrapeConfigs&lt;&#x2F;code&gt; editing. &lt;code&gt;gpsdHost&lt;&#x2F;code&gt; points at whichever machine runs your GPS receiver. The geopoint options enable geo-offset histograms — set them to your receiver’s actual coordinates. You can read the initial values from the &lt;code&gt;gpsd_lat&lt;&#x2F;code&gt; and &lt;code&gt;gpsd_long&lt;&#x2F;code&gt; metrics once the exporter is running, then pin them in your config.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;standalone-binary&quot;&gt;Standalone binary&lt;&#x2F;h3&gt;
&lt;p&gt;Not on NixOS? The binary works anywhere:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;prometheus-gpsd-exporter \
  --hostname &amp;lt;gpsd-host&amp;gt; \
  --port 2947 \
  --exporter-port 9015 \
  --listen-address :: \
  --pps-histogram
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;cli-reference&quot;&gt;CLI reference&lt;&#x2F;h2&gt;
&lt;pre&gt;&lt;code&gt;-H, --hostname &amp;lt;HOST&amp;gt;       gpsd host [default: localhost]
-p, --port &amp;lt;PORT&amp;gt;           gpsd port [default: 2947]
-E, --exporter-port &amp;lt;PORT&amp;gt;  metrics port [default: 9015]
-L, --listen-address &amp;lt;ADDR&amp;gt; listen address [default: ::]
-t, --timeout &amp;lt;SECS&amp;gt;        connection timeout [default: 10]
--retry-delay &amp;lt;SECS&amp;gt;        initial retry delay [default: 10]
--max-retry-delay &amp;lt;SECS&amp;gt;    max retry delay [default: 300]
-S, --disable-monitor-satellites
--pps-histogram             enable PPS offset histograms
--pps-bucket-size &amp;lt;NS&amp;gt;      [default: 250]
--pps-bucket-count &amp;lt;N&amp;gt;      [default: 40]
--pps-time1 &amp;lt;FLOAT&amp;gt;         PPS time1 offset [default: 0]
--offset-from-geopoint      enable geo-offset tracking
--geopoint-lat &amp;lt;FLOAT&amp;gt;
--geopoint-lon &amp;lt;FLOAT&amp;gt;
--geo-bucket-size &amp;lt;M&amp;gt;       [default: 0.5]
--geo-bucket-count &amp;lt;N&amp;gt;      [default: 40]
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The PPS options are for precision timing setups — if you’re running an NTP server disciplined by a GPS PPS signal, the histograms show pulse offset distribution. The geo-offset options track how far your reported position drifts from a known fixed point, which is useful for evaluating antenna placement or receiver quality.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-metrics-you-get&quot;&gt;What metrics you get&lt;&#x2F;h2&gt;
&lt;p&gt;The exporter covers every gpsd message type that carries useful data:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;TPV&lt;&#x2F;strong&gt; — latitude, longitude, altitude, speed, track, climb, and all the error estimates (epx, epy, epv, ept, eps, epc). The core positioning data.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;SKY&lt;&#x2F;strong&gt; — DOP values (gdop, hdop, pdop, tdop, vdop, xdop, ydop) and satellite counts. High DOP means bad geometry — you want these low.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Per-satellite&lt;&#x2F;strong&gt; — signal strength, azimuth, elevation, and health status. Labeled by PRN, svid, gnssid, sigid, freqid, and whether the satellite is used in the fix. This is the data that tells you if your antenna has a clear sky view.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;GST&lt;&#x2F;strong&gt; — pseudorange noise and error ellipse data. The statistical quality of your position solution.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;TOFF&lt;&#x2F;strong&gt; — kernel-vs-GPS time offset. Essential if you’re running NTP with a GPS reference clock.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;PPS&lt;&#x2F;strong&gt; — pulse-per-second timing data, with optional histograms for offset distribution.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;OSC&lt;&#x2F;strong&gt; — oscillator status: running, reference, disciplined, delta. For GPSDO setups.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Info&lt;&#x2F;strong&gt; — gpsd version and connected device metadata.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Metric names are compatible with the Python exporter (&lt;code&gt;gpsd_lat&lt;&#x2F;code&gt;, &lt;code&gt;gpsd_hdop&lt;&#x2F;code&gt;, &lt;code&gt;gpsd_satellites_used&lt;&#x2F;code&gt;, etc.), so existing Grafana dashboards work without changes. Switching from the Python exporter is a drop-in replacement.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-this-over-the-alternatives&quot;&gt;Why this over the alternatives&lt;&#x2F;h2&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;&lt;&#x2F;th&gt;&lt;th&gt;Python [1]&lt;&#x2F;th&gt;&lt;th&gt;Go [2]&lt;&#x2F;th&gt;&lt;th&gt;Rust [3]&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Crash resilient&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Streaming&lt;&#x2F;td&gt;&lt;td&gt;yes&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Reconnect&lt;&#x2F;td&gt;&lt;td&gt;buggy&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;No runtime deps&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;yes&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Dashboard compat&lt;&#x2F;td&gt;&lt;td&gt;yes&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;GST&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;yes&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;TOFF&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;yes&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;OSC&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;yes&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;PPS histograms&lt;&#x2F;td&gt;&lt;td&gt;yes&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Geo-offset&lt;&#x2F;td&gt;&lt;td&gt;yes&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Per-sat labels&lt;&#x2F;td&gt;&lt;td&gt;yes&lt;&#x2F;td&gt;&lt;td&gt;partial&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Per-sat sigid&lt;&#x2F;td&gt;&lt;td&gt;no&lt;&#x2F;td&gt;&lt;td&gt;yes&lt;&#x2F;td&gt;&lt;td&gt;&lt;strong&gt;yes&lt;&#x2F;strong&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;[1] &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;brendanbank&#x2F;gpsd-prometheus-exporter&quot;&gt;brendanbank&#x2F;gpsd-prometheus-exporter&lt;&#x2F;a&gt;&lt;br&gt;
[2] &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;brendanbank&#x2F;gpsd-prometheus-exporter&#x2F;tree&#x2F;master&quot;&gt;natesales&#x2F;gpsd-exporter&lt;&#x2F;a&gt;&lt;br&gt;
[3] &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;prometheus-gpsd-exporter&quot;&gt;ijohanne&#x2F;prometheus-gpsd-exporter&lt;&#x2F;a&gt;&lt;&#x2F;p&gt;
&lt;p&gt;The Python exporter is the one most people start with. It has the right metric names and decent coverage, but the &lt;code&gt;gps&lt;&#x2F;code&gt; library dependency is a liability — &lt;code&gt;StopIteration&lt;&#x2F;code&gt; crashes are a known issue with no upstream fix. The Go exporter covers more message types but uses polling instead of streaming, crashes on parse errors, and renames every metric so your dashboards need rewriting.&lt;&#x2F;p&gt;
&lt;p&gt;The Rust exporter takes the superset of both feature sets and adds the one thing neither has: reliability. It stays connected, reconnects when it can’t, and never exits on bad data.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping up&lt;&#x2F;h2&gt;
&lt;p&gt;The setup is a flake input, a module configuration, and making sure gpsd is configured to actually emit the data you want — which means getting the baud rate right and, if the exporter runs remotely, opening up the network socket. Once that’s wired, your GPS receiver’s position accuracy, satellite coverage, timing offsets, and signal quality all live in Prometheus alongside everything else. The &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;prometheus-gpsd-exporter&quot;&gt;project is on GitHub&lt;&#x2F;a&gt; — contributions welcome.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Running Hickory-DNS as a Full Authoritative + Recursive DNS Server on NixOS</title>
        <published>2026-04-02T00:05:00+00:00</published>
        <updated>2026-04-02T00:05:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/hickory-dns-nixos/"/>
        <id>https://perlpimp.net/blog/hickory-dns-nixos/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/hickory-dns-nixos/">&lt;p&gt;BIND has been around since the 1980s. Unbound does recursive resolution well but doesn’t serve authoritative zones. CoreDNS is fine until you want DDNS with TSIG authentication and it’s suddenly less fine. Then there’s hickory-dns — a Rust DNS server that handles authoritative zones, recursive resolution, sqlite-backed DDNS zones, TSIG key verification, and Prometheus metrics. All in one binary.&lt;&#x2F;p&gt;
&lt;p&gt;The catch is it’s not in nixpkgs with the features you need. So you build it from source, wire it into a NixOS module, and generate your zone files declaratively from a host registry. This post walks through the entire working setup — not a toy example, but a production config with multiple VLANs, IPv4 and IPv6 reverse DNS, Kea DHCP-DDNS integration, systemd hardening, and a custom Grafana dashboard built from scratch because nobody’s made one yet.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-hickory-dns&quot;&gt;Why hickory-dns&lt;&#x2F;h2&gt;
&lt;p&gt;The pitch is short:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Rust&lt;&#x2F;strong&gt; — memory-safe, single static-ish binary, no garbage collector pauses&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Built-in recursor&lt;&#x2F;strong&gt; — one process handles both authoritative and recursive queries&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Prometheus metrics&lt;&#x2F;strong&gt; — native &lt;code&gt;&#x2F;metrics&lt;&#x2F;code&gt; endpoint, no sidecar exporter&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;SQLite-backed DDNS zones&lt;&#x2F;strong&gt; — RFC 2136 dynamic updates with TSIG authentication, journal files for durability&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;DNSSEC support&lt;&#x2F;strong&gt; — ring-based crypto for TSIG key verification&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;One binary replaces what would otherwise be BIND (or Unbound + a separate authoritative server) plus a Prometheus exporter. The config is TOML, which is a nice change from BIND’s syntax.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;building-from-source-in-nix&quot;&gt;Building from source in Nix&lt;&#x2F;h2&gt;
&lt;p&gt;Hickory-dns isn’t packaged in nixpkgs with the feature combination you need — sqlite, recursor, prometheus-metrics, and dnssec-ring. Build it inline:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;hickory-dns = pkgs.rustPlatform.buildRustPackage rec {
  pname = &amp;quot;hickory-dns&amp;quot;;
  version = &amp;quot;0.26.0-beta.2&amp;quot;;
  src = pkgs.fetchFromGitHub {
    owner = &amp;quot;hickory-dns&amp;quot;;
    repo = &amp;quot;hickory-dns&amp;quot;;
    hash = &amp;quot;sha256-7kra6MbLcv0P6iiUJ+hQ0ezqgXh&amp;#x2F;1KskCrZvFYDqiXQ=&amp;quot;;
    rev = &amp;quot;v${version}&amp;quot;;
  };
  cargoHash = &amp;quot;sha256-FfckN+qhSqbc8jnL0xThdAMQEgluocSY1ksEyT8rFFY=&amp;quot;;
  buildAndTestSubdir = &amp;quot;bin&amp;quot;;
  buildFeatures = [
    &amp;quot;sqlite&amp;quot; &amp;quot;resolver&amp;quot; &amp;quot;recursor&amp;quot;
    &amp;quot;prometheus-metrics&amp;quot; &amp;quot;dnssec-ring&amp;quot;
  ];
  nativeBuildInputs = [ pkgs.pkg-config ];
  buildInputs = [ pkgs.openssl pkgs.sqlite ];
  doCheck = false;
  meta.mainProgram = &amp;quot;hickory-dns&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A few things to note. &lt;code&gt;buildAndTestSubdir = &quot;bin&quot;&lt;&#x2F;code&gt; is critical — the workspace has many crates and you only want the server binary. The &lt;code&gt;buildFeatures&lt;&#x2F;code&gt; list is the whole reason you’re building from source: &lt;code&gt;sqlite&lt;&#x2F;code&gt; for DDNS journal zones, &lt;code&gt;recursor&lt;&#x2F;code&gt; for upstream resolution, &lt;code&gt;prometheus-metrics&lt;&#x2F;code&gt; for monitoring, and &lt;code&gt;dnssec-ring&lt;&#x2F;code&gt; for TSIG key verification on dynamic updates. Tests are disabled because they require network access — &lt;code&gt;doCheck = false&lt;&#x2F;code&gt; and move on.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;generating-zone-files-from-a-host-registry&quot;&gt;Generating zone files from a host registry&lt;&#x2F;h2&gt;
&lt;p&gt;Hard-coding zone files is fine for five hosts. At twenty, with multiple VLANs, reverse zones, and IPv6, it’s a maintenance disaster. The better approach is generating everything from a single Nix attrset — a host registry:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;hosts = {
  myhost  = { ip = &amp;quot;10.0.1.10&amp;quot;; mac = &amp;quot;aa:bb:cc:dd:ee:ff&amp;quot;; };
  another = { ip = &amp;quot;10.0.2.20&amp;quot;; ip6 = &amp;quot;fd00:255:101::20&amp;quot;;
              dns = [ &amp;quot;another&amp;quot; &amp;quot;alias&amp;quot; ]; };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Every host has an IP and optional fields — MAC for DHCP reservations, &lt;code&gt;ip6&lt;&#x2F;code&gt; for IPv6, &lt;code&gt;dns&lt;&#x2F;code&gt; for additional names. From this, you generate forward zones, reverse zones, and DHCP configs. One source of truth, zero drift.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;forward-zone-a-and-aaaa-records&quot;&gt;Forward zone (A and AAAA records)&lt;&#x2F;h3&gt;
&lt;p&gt;The SOA boilerplate goes into a helper:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;mkSoa = zone: &amp;#x27;&amp;#x27;
  $ORIGIN ${zone}.
  $TTL 3600
  @ IN SOA ns1.${domain}. admin.${domain}. (
      1       ; serial
      3600    ; refresh
      900     ; retry
      604800  ; expire
      300     ; minimum
  )
  @ IN NS ns1.${domain}.
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then you filter and map:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;isInZone = name: !(lib.hasInfix &amp;quot;.&amp;quot; name);

hostARecords = lib.flatten (lib.mapAttrsToList (name: host:
  let names = builtins.filter isInZone (hostDnsNames name host);
  in map (n: &amp;quot;${n} IN A ${host.ip}&amp;quot;) names
) network.hosts);

hostAAAARecords = lib.optionals network.enableIPv6ULA (lib.flatten (
  lib.mapAttrsToList (name: host:
    let names = builtins.filter isInZone (hostDnsNames name host);
    in map (n: &amp;quot;${n} IN AAAA ${host.ip6}&amp;quot;) names
  ) hostsWithIp6));

forwardZoneContent =
  mkSoa domain
  + &amp;quot;ns1 IN A ${gateway.ip}\n&amp;quot;
  + lib.concatStringsSep &amp;quot;\n&amp;quot; (hostARecords ++ extraARecords ++ hostAAAARecords)
  + &amp;quot;\n&amp;quot;;

forwardZoneFile = pkgs.writeText &amp;quot;${domain}.zone&amp;quot; forwardZoneContent;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;isInZone&lt;&#x2F;code&gt; filter catches an easy mistake — names with dots in them (like &lt;code&gt;k8s-master.local&lt;&#x2F;code&gt;) belong in other domains and can’t be served from this authoritative zone. Filter them out at the Nix level instead of debugging broken DNS resolution later.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;ipv4-reverse-zones-ptr-records-per-24-subnet&quot;&gt;IPv4 reverse zones (PTR records per &#x2F;24 subnet)&lt;&#x2F;h3&gt;
&lt;p&gt;Reverse zones are per-subnet, so you group hosts by their first three octets:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;hostsBySubnet = lib.groupBy (h:
  let parts = lib.splitString &amp;quot;.&amp;quot; h.ip;
  in &amp;quot;${builtins.elemAt parts 0}.${builtins.elemAt parts 1}.${builtins.elemAt parts 2}&amp;quot;
) allHostEntries;

mkReverseZone = subnet: entries:
  let
    parts = lib.splitString &amp;quot;.&amp;quot; subnet;
    revSubnet = &amp;quot;${builtins.elemAt parts 2}.${builtins.elemAt parts 1}.${builtins.elemAt parts 0}&amp;quot;;
    zoneName = &amp;quot;${revSubnet}.in-addr.arpa&amp;quot;;
    records = map (h:
      let lastOctet = lib.last (lib.splitString &amp;quot;.&amp;quot; h.ip);
      in &amp;quot;${lastOctet} IN PTR ${h.dnsName}.${domain}.&amp;quot;
    ) entries;
  in {
    name = zoneName;
    content = mkSoa zoneName
      + &amp;quot;ns1.${domain}. IN A ${gateway.ip}\n&amp;quot;
      + lib.concatStringsSep &amp;quot;\n&amp;quot; records + &amp;quot;\n&amp;quot;;
  };

reverseZones = lib.mapAttrsToList mkReverseZone hostsBySubnet;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Each &#x2F;24 subnet gets its own &lt;code&gt;.in-addr.arpa&lt;&#x2F;code&gt; zone with PTR records pointing back to the canonical hostname. Add a host to the registry, rebuild, and forward and reverse DNS update together.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;ipv6-reverse-zones-nibble-based-ptr-records&quot;&gt;IPv6 reverse zones (nibble-based PTR records)&lt;&#x2F;h3&gt;
&lt;p&gt;IPv6 reverse DNS is where things get tedious. Each address expands to individual hex nibbles, reversed and dot-separated under &lt;code&gt;.ip6.arpa&lt;&#x2F;code&gt;. A helper does the expansion:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;ip6Nibbles = addr:
  let expanded = network.expandIp6 addr;
  in lib.stringToCharacters (lib.replaceStrings [&amp;quot;:&amp;quot;] [&amp;quot;&amp;quot;] expanded);

ip6ZoneName = addr:
  let
    nibbles = ip6Nibbles addr;
    rev12 = lib.concatStringsSep &amp;quot;.&amp;quot; (lib.reverseList (lib.take 12 nibbles));
  in &amp;quot;${rev12}.ip6.arpa&amp;quot;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The first 12 nibbles (48 bits) form the zone name — this covers a &#x2F;48 prefix. The remaining 20 nibbles become the host part of each PTR record:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;mkIp6ReverseZone = zoneName: entries:
  let
    records = map (h:
      let
        nibbles = ip6Nibbles h.ip6;
        allRev = lib.reverseList nibbles;
        relName = lib.concatStringsSep &amp;quot;.&amp;quot; (lib.take 20 allRev);
      in &amp;quot;${relName} IN PTR ${h.dnsName}.${domain}.&amp;quot;
    ) entries;
  in {
    name = zoneName;
    content = mkSoa zoneName
      + &amp;quot;ns1.${domain}. IN A ${gateway.ip}\n&amp;quot;
      + lib.concatStringsSep &amp;quot;\n&amp;quot; records + &amp;quot;\n&amp;quot;;
  };

ip6ReverseZones = lib.optionals network.enableIPv6ULA
  (lib.mapAttrsToList mkIp6ReverseZone hostsByIp6Zone);
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Manually maintaining nibble-format PTR records for even a dozen IPv6 hosts is a recipe for typos. Generating them from the registry makes it mechanical.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;splitting-zones-static-vs-ddns-updatable&quot;&gt;Splitting zones: static vs DDNS-updatable&lt;&#x2F;h3&gt;
&lt;p&gt;Here’s the key architectural decision. Reverse zones for subnets with DHCP clients need to accept dynamic updates — Kea D2 will send RFC 2136 updates to create PTR records when leases are handed out. But reverse zones for static-only subnets should be plain zone files, read-only. You split them at the Nix level:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;ddnsSubnets = [ &amp;quot;10.0.1&amp;quot; &amp;quot;10.0.2&amp;quot; &amp;quot;10.0.3&amp;quot; ];

zoneToSubnet = zoneName:
  let
    stripped = lib.removeSuffix &amp;quot;.in-addr.arpa&amp;quot; zoneName;
    parts = lib.splitString &amp;quot;.&amp;quot; stripped;
  in &amp;quot;${builtins.elemAt parts 2}.${builtins.elemAt parts 1}.${builtins.elemAt parts 0}&amp;quot;;

isDdnsReverseZone = z: builtins.elem (zoneToSubnet z.name) ddnsSubnets;
ddnsReverseZones = builtins.filter isDdnsReverseZone reverseZones;
staticReverseZones = builtins.filter (z: !(isDdnsReverseZone z)) reverseZones;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Same split for IPv6 — DHCP-served subnets get sqlite-backed DDNS zones, management subnets get static files. This matters because DDNS zones require sqlite storage, TSIG key configuration, and journal files. You don’t want that overhead on zones that never change.&lt;&#x2F;p&gt;
&lt;p&gt;The DDNS forward zones start empty — they’re populated at runtime by Kea:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;ddnsZoneContent =
  mkSoa &amp;quot;dhcp.${domain}&amp;quot;
  + &amp;quot;ns1.${domain}. IN A ${gateway.ip}\n&amp;quot;;

guestZoneContent =
  mkSoa &amp;quot;guest.${domain}&amp;quot;
  + &amp;quot;ns1.${domain}. IN A ${gateway.ip}\n&amp;quot;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;toml-configuration&quot;&gt;TOML configuration&lt;&#x2F;h2&gt;
&lt;p&gt;With all the zone files generated, the TOML config ties everything together. Four categories of zones, plus a recursor for upstream resolution:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;toml&quot; class=&quot;language-toml &quot;&gt;&lt;code class=&quot;language-toml&quot; data-lang=&quot;toml&quot;&gt;listen_addrs_ipv4 = [&amp;quot;10.0.0.1&amp;quot;, &amp;quot;10.0.1.1&amp;quot;, &amp;quot;10.0.2.1&amp;quot;, &amp;quot;127.0.0.1&amp;quot;]
listen_addrs_ipv6 = [&amp;quot;fd00:255:100::1&amp;quot;, &amp;quot;::1&amp;quot;]
listen_port = 53
directory = &amp;quot;&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;hickory-dns&amp;quot;
tcp_request_timeout = 5
allow_networks = [&amp;quot;10.0.0.0&amp;#x2F;8&amp;quot;, &amp;quot;127.0.0.0&amp;#x2F;8&amp;quot;, &amp;quot;fd00::&amp;#x2F;8&amp;quot;, &amp;quot;::1&amp;#x2F;128&amp;quot;]
prometheus_listen_addr = &amp;quot;127.0.0.1:9153&amp;quot;

# Static authoritative forward zone
[[zones]]
zone = &amp;quot;example.net.&amp;quot;
zone_type = &amp;quot;Primary&amp;quot;
file = &amp;quot;&amp;#x2F;nix&amp;#x2F;store&amp;#x2F;...-example.net.zone&amp;quot;

# DDNS forward zone (sqlite + TSIG)
[[zones]]
zone = &amp;quot;dhcp.example.net.&amp;quot;
zone_type = &amp;quot;Primary&amp;quot;

[zones.stores]
type = &amp;quot;sqlite&amp;quot;
zone_path = &amp;quot;&amp;#x2F;nix&amp;#x2F;store&amp;#x2F;...-dhcp.example.net.zone&amp;quot;
journal_path = &amp;quot;&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;hickory-dns&amp;#x2F;dhcp.example.net.jrnl&amp;quot;
allow_update = true

[[zones.stores.tsig_keys]]
name = &amp;quot;kea-ddns-key.&amp;quot;
algorithm = &amp;quot;hmac-sha256&amp;quot;
key_file = &amp;quot;&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;hickory-dns&amp;#x2F;tsig-key.bin&amp;quot;

# Root hints recursor (upstream resolution)
[[zones]]
zone = &amp;quot;.&amp;quot;
zone_type = &amp;quot;External&amp;quot;

[zones.stores]
type = &amp;quot;recursor&amp;quot;
roots = &amp;quot;&amp;#x2F;nix&amp;#x2F;store&amp;#x2F;...-root.hints&amp;quot;
ns_cache_size = 1024
record_cache_size = 1048576
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A few important details. You must list every interface IP explicitly in &lt;code&gt;listen_addrs_ipv4&lt;&#x2F;code&gt; — hickory-dns doesn’t support &lt;code&gt;0.0.0.0&lt;&#x2F;code&gt; binding reliably. The &lt;code&gt;allow_networks&lt;&#x2F;code&gt; list restricts which clients can query. DDNS zones use &lt;code&gt;type = &quot;sqlite&quot;&lt;&#x2F;code&gt; with &lt;code&gt;allow_update = true&lt;&#x2F;code&gt; and a TSIG key for authentication. The root hints file comes from &lt;code&gt;pkgs.dns-root-data&lt;&#x2F;code&gt;. And &lt;code&gt;record_cache_size = 1048576&lt;&#x2F;code&gt; gives the recursor a generous cache — one million entries.&lt;&#x2F;p&gt;
&lt;p&gt;The Nix side generates this TOML with string interpolation, rendering static reverse zones as plain file zones and DDNS reverse zones with the sqlite&#x2F;TSIG configuration:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;# Static reverse zones — plain file, read-only
staticReverseZoneToml = lib.concatMapStringsSep &amp;quot;\n&amp;quot; (z: &amp;#x27;&amp;#x27;
  [[zones]]
  zone = &amp;quot;${z.name}.&amp;quot;
  zone_type = &amp;quot;Primary&amp;quot;
  file = &amp;quot;${reverseZoneFilesByName.${z.name}}&amp;quot;
&amp;#x27;&amp;#x27;) staticReverseZones;

# DDNS reverse zones — sqlite + journal + TSIG key
ddnsReverseZoneToml = lib.concatMapStringsSep &amp;quot;\n&amp;quot; (z: &amp;#x27;&amp;#x27;
  [[zones]]
  zone = &amp;quot;${z.name}.&amp;quot;
  zone_type = &amp;quot;Primary&amp;quot;

  [zones.stores]
  type = &amp;quot;sqlite&amp;quot;
  zone_path = &amp;quot;${reverseZoneFilesByName.${z.name}}&amp;quot;
  journal_path = &amp;quot;${dataDir}&amp;#x2F;${z.name}.jrnl&amp;quot;
  allow_update = true

  ${tsigKeyToml}
&amp;#x27;&amp;#x27;) ddnsReverseZones;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Same pattern for IPv6 — static and DDNS variants, generated from the zone split you did earlier.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;systemd-service-with-hardening&quot;&gt;SystemD service with hardening&lt;&#x2F;h2&gt;
&lt;p&gt;The service definition is straightforward but the hardening is thorough:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;users.users.hickory-dns = {
  isSystemUser = true;
  group = &amp;quot;hickory-dns&amp;quot;;
};
users.groups.hickory-dns = {};

systemd.services.hickory-dns = {
  description = &amp;quot;hickory-dns DNS server&amp;quot;;
  after = [ &amp;quot;network-online.target&amp;quot; ];
  wants = [ &amp;quot;network-online.target&amp;quot; ];
  wantedBy = [ &amp;quot;multi-user.target&amp;quot; ];

  serviceConfig = {
    ExecStartPre = &amp;quot;${pkgs.bash}&amp;#x2F;bin&amp;#x2F;bash -c &amp;#x27;${pkgs.coreutils}&amp;#x2F;bin&amp;#x2F;base64 -d \
      &amp;lt; ${config.sops.secrets.hickory_dns_private_key.path} \
      &amp;gt; ${tsigKeyRawPath}&amp;#x27;&amp;quot;;
    ExecStart = &amp;quot;${hickory-dns}&amp;#x2F;bin&amp;#x2F;hickory-dns -c ${configFile}&amp;quot;;
    User = &amp;quot;hickory-dns&amp;quot;;
    Group = &amp;quot;hickory-dns&amp;quot;;
    StateDirectory = &amp;quot;hickory-dns&amp;quot;;
    AmbientCapabilities = [ &amp;quot;CAP_NET_BIND_SERVICE&amp;quot; ];
    CapabilityBoundingSet = [ &amp;quot;CAP_NET_BIND_SERVICE&amp;quot; ];
    LockPersonality = true;
    MemoryDenyWriteExecute = true;
    NoNewPrivileges = true;
    PrivateDevices = true;
    PrivateTmp = true;
    ProtectClock = true;
    ProtectControlGroups = true;
    ProtectHome = true;
    ProtectHostname = true;
    ProtectKernelLogs = true;
    ProtectKernelModules = true;
    ProtectKernelTunables = true;
    ProtectSystem = &amp;quot;strict&amp;quot;;
    ReadWritePaths = [ dataDir ];
    RestrictAddressFamilies = [ &amp;quot;AF_INET&amp;quot; &amp;quot;AF_INET6&amp;quot; &amp;quot;AF_UNIX&amp;quot; ];
    RestrictNamespaces = true;
    RestrictRealtime = true;
    SystemCallArchitectures = &amp;quot;native&amp;quot;;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;ExecStartPre&lt;&#x2F;code&gt; step decodes the TSIG key from a sops-nix secret before the server starts — the raw key file lives in the state directory and is readable only by the service user. &lt;code&gt;CAP_NET_BIND_SERVICE&lt;&#x2F;code&gt; is the only capability needed (port 53). &lt;code&gt;StateDirectory = &quot;hickory-dns&quot;&lt;&#x2F;code&gt; tells systemd to create &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;hickory-dns&lt;&#x2F;code&gt; owned by the service user. &lt;code&gt;ProtectSystem = &quot;strict&quot;&lt;&#x2F;code&gt; plus &lt;code&gt;ReadWritePaths&lt;&#x2F;code&gt; means only the state dir is writable — for sqlite journals and the decoded TSIG key. Everything else is locked down.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;kea-dhcp-ddns-integration&quot;&gt;Kea DHCP-DDNS integration&lt;&#x2F;h2&gt;
&lt;p&gt;This is where most of the debugging happens. Three pieces need to align: Kea’s DHCP servers, the D2 daemon, and hickory-dns. Get one wrong and you’ll have hostnames that resolve for some devices but not others, or PTR records that point to the wrong zone.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;skipping-static-reservations&quot;&gt;Skipping static reservations&lt;&#x2F;h3&gt;
&lt;p&gt;Hosts with static DNS entries in the forward zone must not get DDNS records. Otherwise Kea creates &lt;code&gt;myhost.dhcp.example.net&lt;&#x2F;code&gt; entries that shadow the authoritative &lt;code&gt;myhost.example.net&lt;&#x2F;code&gt; records — or worse, you get duplicate names in different zones with different TTLs and spend an evening figuring out why &lt;code&gt;dig&lt;&#x2F;code&gt; returns different answers depending on which resolver cache you hit.&lt;&#x2F;p&gt;
&lt;p&gt;The fix: tag all static reservations with a &lt;code&gt;SKIP_DDNS&lt;&#x2F;code&gt; client class:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;skipDdnsReservations = rs: map (r: r &amp;#x2F;&amp;#x2F; { client-classes = [ &amp;quot;SKIP_DDNS&amp;quot; ]; }) rs;

# Applied to both DHCPv4 and DHCPv6 reservations
reservations = skipDdnsReservations network.dhcpReservations;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This works because Kea’s &lt;code&gt;ddns_tuning&lt;&#x2F;code&gt; hooks library respects the class and suppresses DDNS updates for matching clients:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;hooks-libraries = [{
  library = &amp;quot;${pkgs.kea}&amp;#x2F;lib&amp;#x2F;kea&amp;#x2F;hooks&amp;#x2F;libdhcp_ddns_tuning.so&amp;quot;;
  parameters = {};
}];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Load this hook in both &lt;code&gt;dhcp4&lt;&#x2F;code&gt; and &lt;code&gt;dhcp6&lt;&#x2F;code&gt; settings.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;d2-config-with-tsig-and-reverse-dns&quot;&gt;D2 config with TSIG and reverse DNS&lt;&#x2F;h3&gt;
&lt;p&gt;The Kea D2 daemon handles the actual RFC 2136 updates. Its config contains the TSIG key secret, so it’s rendered via a sops template:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;systemd.services.kea-dhcp-ddns-server = {
  after = [ &amp;quot;hickory-dns.service&amp;quot; ];
  wants = [ &amp;quot;hickory-dns.service&amp;quot; ];
  serviceConfig.ExecStart = lib.mkForce
    &amp;quot;${pkgs.kea}&amp;#x2F;bin&amp;#x2F;kea-dhcp-ddns -c ${config.sops.templates.&amp;quot;kea-dhcp-ddns.conf&amp;quot;.path}&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The template itself defines forward and reverse DDNS domains:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;sops.templates.&amp;quot;kea-dhcp-ddns.conf&amp;quot; = {
  mode = &amp;quot;0444&amp;quot;;
  restartUnits = [ &amp;quot;kea-dhcp-ddns-server.service&amp;quot; ];
  content = builtins.toJSON {
    DhcpDdns = {
      ip-address = &amp;quot;127.0.0.1&amp;quot;;
      port = 53001;
      dns-server-timeout = 3000;

      tsig-keys = [{
        name = &amp;quot;kea-ddns-key.&amp;quot;;
        algorithm = &amp;quot;HMAC-SHA256&amp;quot;;
        secret = config.sops.placeholder.hickory_dns_private_key;
      }];

      forward-ddns = {
        ddns-domains = [
          {
            name = &amp;quot;dhcp.example.net.&amp;quot;;
            key-name = &amp;quot;kea-ddns-key.&amp;quot;;
            dns-servers = [{ ip-address = &amp;quot;127.0.0.1&amp;quot;; port = 53; }];
          }
          {
            name = &amp;quot;guest.example.net.&amp;quot;;
            key-name = &amp;quot;kea-ddns-key.&amp;quot;;
            dns-servers = [{ ip-address = &amp;quot;127.0.0.1&amp;quot;; port = 53; }];
          }
        ];
      };

      reverse-ddns = {
        ddns-domains = [
          # IPv4 reverse zones for DHCP subnets
          {
            name = &amp;quot;1.0.10.in-addr.arpa.&amp;quot;;
            key-name = &amp;quot;kea-ddns-key.&amp;quot;;
            dns-servers = [{ ip-address = &amp;quot;127.0.0.1&amp;quot;; port = 53; }];
          }
          {
            name = &amp;quot;2.0.10.in-addr.arpa.&amp;quot;;
            key-name = &amp;quot;kea-ddns-key.&amp;quot;;
            dns-servers = [{ ip-address = &amp;quot;127.0.0.1&amp;quot;; port = 53; }];
          }
          # IPv6 reverse zones (generated from ULA prefix)
          {
            name = wiredIp6RevZone;
            key-name = &amp;quot;kea-ddns-key.&amp;quot;;
            dns-servers = [{ ip-address = &amp;quot;127.0.0.1&amp;quot;; port = 53; }];
          }
          {
            name = wifiIp6RevZone;
            key-name = &amp;quot;kea-ddns-key.&amp;quot;;
            dns-servers = [{ ip-address = &amp;quot;127.0.0.1&amp;quot;; port = 53; }];
          }
        ];
      };
    };
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Gotcha worth highlighting:&lt;&#x2F;strong&gt; because &lt;code&gt;ExecStart&lt;&#x2F;code&gt; is overridden with &lt;code&gt;lib.mkForce&lt;&#x2F;code&gt; to point at the sops template, the NixOS kea module’s own restart triggers no longer cover the actual config. Without &lt;code&gt;restartUnits&lt;&#x2F;code&gt; on the sops template, changing the D2 config deploys a new template file but the running D2 process keeps the old one. The &lt;code&gt;restartUnits = [ &quot;kea-dhcp-ddns-server.service&quot; ]&lt;&#x2F;code&gt; line tells sops-nix to restart D2 whenever the rendered template content changes. This pattern is needed any time you &lt;code&gt;mkForce&lt;&#x2F;code&gt; a service’s &lt;code&gt;ExecStart&lt;&#x2F;code&gt; to use a sops template.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;per-subnet-ddns-settings&quot;&gt;Per-subnet DDNS settings&lt;&#x2F;h3&gt;
&lt;p&gt;Each VLAN gets its own qualifying suffix — or no DDNS at all:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;# WiFi and wired subnets — register in dhcp.example.net
ddns-send-updates = true;
ddns-qualifying-suffix = &amp;quot;dhcp.example.net.&amp;quot;;

# Guest subnet — register in guest.example.net
ddns-send-updates = true;
ddns-qualifying-suffix = &amp;quot;guest.example.net.&amp;quot;;

# Camera and management subnets — no DDNS
ddns-send-updates = false;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Global DDNS settings that apply to both DHCPv4 and DHCPv6:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;ddns-override-client-update = true;
ddns-replace-client-name = &amp;quot;never&amp;quot;;
ddns-update-on-renew = true;
hostname-char-set = &amp;quot;[^A-Za-z0-9.-]&amp;quot;;
hostname-char-replacement = &amp;quot;-&amp;quot;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;hostname-char-set&lt;&#x2F;code&gt; and replacement ensure that devices sending garbage hostnames — and they will — get sanitized into valid DNS labels.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;search-domains-per-vlan&quot;&gt;Search domains per VLAN&lt;&#x2F;h3&gt;
&lt;p&gt;Each network gets appropriate search domains so short names resolve without qualification:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;# WiFi and wired: dhcp.example.net, example.net, parent.net
searchDomainWifiWired = {
  code = 119;
  data = &amp;quot;dhcp.example.net, example.net, parent.net&amp;quot;;
  name = &amp;quot;domain-search&amp;quot;;
  space = &amp;quot;dhcp4&amp;quot;;
};

# Management: example.net, parent.net (no dhcp subdomain)
searchDomainMgnt = {
  code = 119;
  data = &amp;quot;example.net, parent.net&amp;quot;;
  name = &amp;quot;domain-search&amp;quot;;
  space = &amp;quot;dhcp4&amp;quot;;
};

# Guest: only guest.example.net
searchDomainGuest = {
  code = 119;
  data = &amp;quot;guest.example.net&amp;quot;;
  name = &amp;quot;domain-search&amp;quot;;
  space = &amp;quot;dhcp4&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Trusted VLANs search both the DDNS zone and the authoritative zone — so &lt;code&gt;ssh myhost&lt;&#x2F;code&gt; resolves whether &lt;code&gt;myhost&lt;&#x2F;code&gt; got its name from a static zone entry or a DHCP lease. Guest gets only its own zone. Management doesn’t need the DDNS subdomain because everything on that VLAN has a static reservation.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-complete-data-flow&quot;&gt;The complete data flow&lt;&#x2F;h3&gt;
&lt;p&gt;When it all comes together:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Client gets a DHCP lease from Kea (v4 or v6)&lt;&#x2F;li&gt;
&lt;li&gt;If &lt;code&gt;ddns-send-updates = true&lt;&#x2F;code&gt; for that subnet and the client isn’t tagged &lt;code&gt;SKIP_DDNS&lt;&#x2F;code&gt;:
&lt;ul&gt;
&lt;li&gt;Kea sends an RFC 2136 UPDATE to the D2 daemon (port 53001)&lt;&#x2F;li&gt;
&lt;li&gt;D2 sends a TSIG-authenticated forward update to hickory-dns (A&#x2F;AAAA in &lt;code&gt;dhcp.&lt;&#x2F;code&gt; or &lt;code&gt;guest.&lt;&#x2F;code&gt; zone)&lt;&#x2F;li&gt;
&lt;li&gt;D2 sends a TSIG-authenticated reverse update to hickory-dns (PTR in &lt;code&gt;.in-addr.arpa&lt;&#x2F;code&gt; or &lt;code&gt;.ip6.arpa&lt;&#x2F;code&gt; zone)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;Hickory-dns validates the TSIG signature and writes to the sqlite journal&lt;&#x2F;li&gt;
&lt;li&gt;Static reservations — servers with known IPs — only appear in the authoritative forward zone, never duplicated in the DDNS zone&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h2 id=&quot;prometheus-monitoring&quot;&gt;Prometheus monitoring&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;code&gt;prometheus-metrics&lt;&#x2F;code&gt; build feature exposes metrics on the address you configured. Scrape config is minimal:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.prometheus.scrapeConfigs = [{
  job_name = &amp;quot;hickory-dns&amp;quot;;
  honor_labels = true;
  static_configs = [{
    targets = [ &amp;quot;127.0.0.1:9153&amp;quot; ];
  }];
}];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The metrics you get are genuinely useful:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;hickory_request_record_types_total&lt;&#x2F;code&gt; — query types (A, AAAA, PTR, HTTPS, MX, SRV, TXT, DS, DNSKEY, SOA, NS)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;hickory_response_codes_total&lt;&#x2F;code&gt; — response codes (NOERROR, NXDOMAIN, SERVFAIL, etc.)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;hickory_request_protocols_total&lt;&#x2F;code&gt; — TCP vs UDP split&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;hickory_recursor_cache_hit_total&lt;&#x2F;code&gt; &#x2F; &lt;code&gt;hickory_recursor_cache_miss_total&lt;&#x2F;code&gt; — cache effectiveness&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;hickory_recursor_cache_hit_duration_seconds_bucket&lt;&#x2F;code&gt; &#x2F; &lt;code&gt;hickory_recursor_cache_miss_duration_seconds_bucket&lt;&#x2F;code&gt; — latency histograms&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;hickory_recursor_response_cache_size&lt;&#x2F;code&gt; &#x2F; &lt;code&gt;hickory_recursor_name_server_cache_size&lt;&#x2F;code&gt; — cache fill levels&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;hickory_recursor_in_flight_queries&lt;&#x2F;code&gt; — concurrent query count&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;hickory_zone_lookups_total&lt;&#x2F;code&gt; — per-zone lookups by handler and success&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;hickory_zone_records_total&lt;&#x2F;code&gt; — record count per zone&lt;&#x2F;li&gt;
&lt;li&gt;Standard process metrics — RSS, virtual memory, CPU seconds, threads, open FDs&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;grafana-dashboard&quot;&gt;Grafana dashboard&lt;&#x2F;h2&gt;
&lt;p&gt;There is no pre-made Grafana dashboard for hickory-dns. Had to build one from scratch by reading the metrics endpoint. It has four sections and fourteen panels.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;overview&quot;&gt;Overview&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Query Rate (by type)&lt;&#x2F;strong&gt; — stacked area chart: &lt;code&gt;sum by (type) (rate(hickory_request_record_types_total{job=&quot;hickory-dns&quot;, type=~&quot;a|aaaa|ptr|https|mx|srv|txt|ds|dnskey|soa|ns&quot;}[$__rate_interval]))&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Response Rate (by rcode)&lt;&#x2F;strong&gt; — &lt;code&gt;sum by (code) (rate(hickory_response_codes_total{job=&quot;hickory-dns&quot;}[$__rate_interval])) &amp;gt; 0&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Recursor Latency&lt;&#x2F;strong&gt; — p50&#x2F;p95 from &lt;code&gt;histogram_quantile&lt;&#x2F;code&gt; on both &lt;code&gt;hickory_recursor_cache_miss_duration_seconds_bucket&lt;&#x2F;code&gt; and &lt;code&gt;hickory_recursor_cache_hit_duration_seconds_bucket&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Requests by Protocol&lt;&#x2F;strong&gt; — &lt;code&gt;sum by (protocol) (rate(hickory_request_protocols_total{job=&quot;hickory-dns&quot;}[$__rate_interval]))&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;cache-and-recursor&quot;&gt;Cache and recursor&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Cache Hit&#x2F;Miss Rate&lt;&#x2F;strong&gt; — &lt;code&gt;rate(hickory_recursor_cache_hit_total{...}[$__rate_interval])&lt;&#x2F;code&gt; and the miss equivalent&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Cache Hit Ratio&lt;&#x2F;strong&gt; — gauge panel: &lt;code&gt;hit_rate &#x2F; (hit_rate + miss_rate)&lt;&#x2F;code&gt; with thresholds: red below 50%, yellow 50–80%, green above 80%&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Cache Sizes &amp;amp; In-Flight&lt;&#x2F;strong&gt; — &lt;code&gt;hickory_recursor_response_cache_size&lt;&#x2F;code&gt;, &lt;code&gt;hickory_recursor_name_server_cache_size&lt;&#x2F;code&gt;, &lt;code&gt;hickory_recursor_in_flight_queries&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;zones&quot;&gt;Zones&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Zone Lookups (by handler)&lt;&#x2F;strong&gt; — &lt;code&gt;sum by (zone_handler, success) (rate(hickory_zone_lookups_total{...}[$__rate_interval]))&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Zone Records Count&lt;&#x2F;strong&gt; — &lt;code&gt;hickory_zone_records_total&lt;&#x2F;code&gt; with legend format &lt;code&gt;{{zone_handler}} {{type}} {{role}}&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h3 id=&quot;process&quot;&gt;Process&lt;&#x2F;h3&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Memory Usage&lt;&#x2F;strong&gt; — RSS vs virtual&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;CPU &#x2F; Threads &#x2F; FDs&lt;&#x2F;strong&gt; — &lt;code&gt;rate(process_cpu_seconds_total{...}[$__rate_interval])&lt;&#x2F;code&gt;, &lt;code&gt;process_threads&lt;&#x2F;code&gt;, &lt;code&gt;process_open_fds&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Design decisions: a &lt;code&gt;${datasource}&lt;&#x2F;code&gt; template variable for Prometheus datasource selection, &lt;code&gt;$__rate_interval&lt;&#x2F;code&gt; everywhere for proper rate calculations, stacked area charts for rates, line charts for latencies, table legends with mean&#x2F;max calcs. Default time range is six hours.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;firewall-rules&quot;&gt;Firewall rules&lt;&#x2F;h2&gt;
&lt;p&gt;DNS needs to be reachable from every VLAN that should resolve. For nftables:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;# Trusted networks (full access)
ip saddr 10.0.0.0&amp;#x2F;8 tcp dport 53 counter accept comment &amp;quot;lan dns tcp&amp;quot;
ip saddr 10.0.0.0&amp;#x2F;8 udp dport 53 counter accept comment &amp;quot;lan dns udp&amp;quot;

# Isolated networks (DNS + DHCP only, everything else dropped)
iifname &amp;quot;guest&amp;quot; udp dport { 53, 67, 68 } counter accept comment &amp;quot;guest dns+dhcp&amp;quot;
iifname &amp;quot;guest&amp;quot; tcp dport 53 counter accept comment &amp;quot;guest dns tcp&amp;quot;

iifname &amp;quot;camera&amp;quot; udp dport { 53, 67, 68 } counter accept comment &amp;quot;camera dns+dhcp&amp;quot;
iifname &amp;quot;camera&amp;quot; tcp dport 53 counter accept comment &amp;quot;camera dns tcp&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Guest and camera networks are isolated — they can reach DNS and DHCP, nothing else. Trusted interfaces get full access. The DNS server is the gateway, so it’s reachable on every VLAN interface without extra routing.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping up&lt;&#x2F;h2&gt;
&lt;p&gt;The whole setup is one Nix module that generates zone files from a host registry, builds hickory-dns with the right features, renders a TOML config, runs a hardened systemd service, and wires up Kea D2 for dynamic updates. Add a host to the registry, rebuild, and forward DNS, reverse DNS, and DHCP reservations all update together. The Prometheus metrics are there from day one, and the Grafana dashboard — since nobody else has built one yet — gives you query rates, cache hit ratios, recursor latency, and per-zone lookup breakdowns. It’s a lot of moving parts, but once it’s declarative, the moving parts stop being your problem.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>NixOS on a Budget VPS: Getting Under the 2GB RAM Floor</title>
        <published>2026-04-01T12:00:00+00:00</published>
        <updated>2026-04-01T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/nixos-low-memory-vps-nixos-infect-kexec/"/>
        <id>https://perlpimp.net/blog/nixos-low-memory-vps-nixos-infect-kexec/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/nixos-low-memory-vps-nixos-infect-kexec/">&lt;p&gt;You found a VPS deal. Three bucks a month, 1GB RAM, somewhere in Eastern Europe. You want NixOS on it. You run nixos-anywhere and it dies halfway through because the kexec installer needs 2GB to do its thing. Your three-dollar dream just hit a wall.&lt;&#x2F;p&gt;
&lt;p&gt;This is the RAM floor problem. The standard nixos-anywhere workflow boots a kexec image that replaces the running kernel with a minimal NixOS installer. That installer needs enough memory to hold itself, the Nix store, and the build artifacts — and that comes out to roughly 2GB. Below that, it either OOM-kills itself or hangs.&lt;&#x2F;p&gt;
&lt;p&gt;Two approaches get you past it:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;nixos-infect&lt;&#x2F;strong&gt; converts the running OS in-place without kexec. Works down to &lt;strong&gt;768MB RAM&lt;&#x2F;strong&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;A custom kexec image&lt;&#x2F;strong&gt; strips the installer down to fit in ~&lt;strong&gt;1GB RAM&lt;&#x2F;strong&gt;, keeping the nixos-anywhere workflow intact.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;nixos-infect-the-in-place-conversion&quot;&gt;nixos-infect: the in-place conversion&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;elitak&#x2F;nixos-infect&quot;&gt;nixos-infect&lt;&#x2F;a&gt; takes whatever Linux distro your VPS booted — usually Ubuntu or Debian — and converts it to NixOS while it’s running. No kexec, no installer image, no second OS in memory. It downloads Nix, builds a NixOS system closure, installs the bootloader, and reboots into your new system.&lt;&#x2F;p&gt;
&lt;p&gt;The tradeoff is that it keeps the existing partition layout. No disko, no declarative disk formatting. Whatever filesystem your provider set up is what you get. For a cheap VPS where the disk layout is “one ext4 partition and a prayer,” that’s usually fine.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s the invocation that actually works on budget providers:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;ssh root@your-vps &amp;quot;curl -sL https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;elitak&amp;#x2F;nixos-infect&amp;#x2F;raw&amp;#x2F;master&amp;#x2F;nixos-infect \
  | sed &amp;#x27;s|mktemp &amp;#x2F;tmp&amp;#x2F;nixos-infect|mktemp &amp;#x2F;var&amp;#x2F;tmp&amp;#x2F;nixos-infect|g&amp;#x27; \
  | doNetConf=y NIX_CHANNEL=nixos-unstable bash -x&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Three things are happening beyond the obvious &lt;code&gt;curl | bash&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-sed-patch-surviving-small-tmp&quot;&gt;The &lt;code&gt;sed&lt;&#x2F;code&gt; patch: surviving small &lt;code&gt;&#x2F;tmp&lt;&#x2F;code&gt;&lt;&#x2F;h3&gt;
&lt;p&gt;Many budget providers mount &lt;code&gt;&#x2F;tmp&lt;&#x2F;code&gt; as a small tmpfs — sometimes just tens of megabytes. nixos-infect creates temporary files via &lt;code&gt;mktemp &#x2F;tmp&#x2F;nixos-infect.*&lt;&#x2F;code&gt;, and on these providers that write hits the tiny tmpfs and fails silently or with a cryptic “no space left on device” error.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;sed&lt;&#x2F;code&gt; rewrites the path from &lt;code&gt;&#x2F;tmp&#x2F;nixos-infect&lt;&#x2F;code&gt; to &lt;code&gt;&#x2F;var&#x2F;tmp&#x2F;nixos-infect&lt;&#x2F;code&gt;, which lives on the real disk. It’s one substitution that turns a mysterious failure into a working install.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;sed &amp;#x27;s|mktemp &amp;#x2F;tmp&amp;#x2F;nixos-infect|mktemp &amp;#x2F;var&amp;#x2F;tmp&amp;#x2F;nixos-infect|g&amp;#x27;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If your provider gives &lt;code&gt;&#x2F;tmp&lt;&#x2F;code&gt; real disk space, the patch is harmless. If it doesn’t, the patch is mandatory.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;donetconf-y-keeping-your-network-alive&quot;&gt;&lt;code&gt;doNetConf=y&lt;&#x2F;code&gt;: keeping your network alive&lt;&#x2F;h3&gt;
&lt;p&gt;Most budget VPS hosts use static IP addressing. Without &lt;code&gt;doNetConf=y&lt;&#x2F;code&gt;, nixos-infect doesn’t capture the current network configuration before rebooting. The machine comes back up with NixOS, but NixOS has no idea what its IP, gateway, or DNS servers are. You’re locked out.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;doNetConf=y&lt;&#x2F;code&gt; tells nixos-infect to snapshot the active network config — IPs, gateways, DNS resolvers — and bake it into the generated NixOS configuration. For any host that isn’t running DHCP, this flag is not optional.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;what-survives-the-reboot&quot;&gt;What survives the reboot&lt;&#x2F;h3&gt;
&lt;p&gt;nixos-infect preserves your root SSH authorized keys, so you keep access after the conversion. The existing partition layout stays as-is. What you get after reboot is a bare NixOS system with your SSH keys and a working network — a blank canvas to deploy your actual config onto.&lt;&#x2F;p&gt;
&lt;p&gt;After reboot, deploy your real NixOS configuration with &lt;code&gt;nixos-rebuild switch --flake&lt;&#x2F;code&gt; or whatever your deployment tool of choice is. nixos-infect just gets you a bootable NixOS base.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;custom-kexec-nixos-anywhere-on-a-diet&quot;&gt;Custom kexec: nixos-anywhere on a diet&lt;&#x2F;h2&gt;
&lt;p&gt;If you want the full nixos-anywhere experience — declarative partitioning with disko, a clean install, the works — but your VPS only has around 1GB of RAM, you need a slimmer kexec image.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;nix-community&#x2F;nixos-anywhere&quot;&gt;standard nixos-anywhere kexec&lt;&#x2F;a&gt; bundles a comfortable installer environment. Comfortable costs memory. A &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;gitlab.com&#x2F;-&#x2F;snippets&#x2F;5973159&quot;&gt;custom kexec image&lt;&#x2F;a&gt; strips that down to the bare minimum.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s what that minimal kexec module looks like:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  self,
  modulesPath,
  lib,
  config,
  pkgs,
  ...
}: {
  imports = [
    &amp;quot;${modulesPath}&amp;#x2F;installer&amp;#x2F;netboot&amp;#x2F;netboot.nix&amp;quot;
    &amp;quot;${modulesPath}&amp;#x2F;profiles&amp;#x2F;minimal.nix&amp;quot;
    &amp;quot;${modulesPath}&amp;#x2F;profiles&amp;#x2F;qemu-guest.nix&amp;quot;
  ];

  system.nixos.variant_id = &amp;quot;kexec&amp;quot;;

  system.build.kexec_compressed =
    pkgs.runCommand &amp;quot;kexec-compressed&amp;quot;
      { buildInputs = [ pkgs.gnutar ]; }
      &amp;#x27;&amp;#x27;
        mkdir $out
        tar -cvzhf $out&amp;#x2F;kexec.tar.gz -C ${config.system.build.kexecTree} \
          bzImage \
          initrd.gz \
          kexec-boot
      &amp;#x27;&amp;#x27;;

  boot = {
    initrd.availableKernelModules = [ &amp;quot;ata_piix&amp;quot; &amp;quot;uhci_hcd&amp;quot; ];
    kernelParams = [
      &amp;quot;panic=30&amp;quot;
      &amp;quot;boot.panic_on_fail&amp;quot;
      &amp;quot;console=ttyS0&amp;quot;
      &amp;quot;console=tty1&amp;quot;
    ];
    kernel.sysctl.&amp;quot;vm.overcommit_memory&amp;quot; = lib.mkForce &amp;quot;1&amp;quot;;
    supportedFilesystems = [ &amp;quot;zfs&amp;quot; ];
  };

  environment.variables.GC_INITIAL_HEAP_SIZE = &amp;quot;1M&amp;quot;;

  documentation.enable = false;
  documentation.nixos.enable = false;
  fonts.fontconfig.enable = false;
  programs.bash.completion.enable = false;
  programs.command-not-found.enable = false;
  security.polkit.enable = false;
  security.rtkit.enable = pkgs.lib.mkForce false;
  services.udisks2.enable = false;
  i18n.supportedLocales = [
    (config.i18n.defaultLocale + &amp;quot;&amp;#x2F;UTF-8&amp;quot;)
  ];

  networking.hostName = &amp;quot;kexec&amp;quot;;
  networking.hostId = &amp;quot;88ad2520&amp;quot;;

  services = {
    getty.autologinUser = lib.mkForce &amp;quot;root&amp;quot;;
    openssh = {
      enable = true;
      settings.KbdInteractiveAuthentication = false;
      settings.PasswordAuthentication = false;
    };
  };

  environment.etc.is_kexec.text = &amp;quot;&amp;quot;;

  system.stateVersion = &amp;quot;23.11&amp;quot;;

  users.users.root.openssh.authorizedKeys.keys = [
    # your SSH public key here
  ];
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The key decisions that shave off memory:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;profiles&#x2F;minimal.nix&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; instead of the full installer profile — no man pages, no extra tooling&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;vm.overcommit_memory = &quot;1&quot;&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — tells the kernel to always say yes to memory allocations and sort it out later via OOM, rather than refusing allocations conservatively. Aggressive, but it’s a throwaway installer environment&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;GC_INITIAL_HEAP_SIZE = &quot;1M&quot;&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — keeps the Nix garbage collector’s initial heap small&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Everything non-essential disabled&lt;&#x2F;strong&gt; — documentation, fonts, bash completion, polkit, udisks, command-not-found. Every service you don’t load is memory you keep&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;&#x2F;etc&#x2F;is_kexec&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; marker file — tells nixos-anywhere it’s already in a kexec environment so it doesn’t try to kexec again&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Build the image and point nixos-anywhere at it:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix build .#kexec-image
nixos-anywhere --flake .#my-vps --kexec .&amp;#x2F;result&amp;#x2F;kexec.tar.gz root@your-vps
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;You get the full nixos-anywhere workflow — disko partitioning, clean install, proper host keys — on a box that would choke on the default kexec image.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-to-use-which&quot;&gt;When to use which&lt;&#x2F;h2&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;RAM&lt;&#x2F;th&gt;&lt;th&gt;Approach&lt;&#x2F;th&gt;&lt;th&gt;What you get&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&amp;lt; 768MB&lt;&#x2F;td&gt;&lt;td&gt;Neither — upgrade or look elsewhere&lt;&#x2F;td&gt;&lt;td&gt;Pain&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;768MB – 1GB&lt;&#x2F;td&gt;&lt;td&gt;nixos-infect&lt;&#x2F;td&gt;&lt;td&gt;NixOS, existing partitions, no disko&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;~1GB&lt;&#x2F;td&gt;&lt;td&gt;Custom kexec + nixos-anywhere&lt;&#x2F;td&gt;&lt;td&gt;Full nixos-anywhere with disko&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;2GB+&lt;&#x2F;td&gt;&lt;td&gt;Standard nixos-anywhere&lt;&#x2F;td&gt;&lt;td&gt;Everything works out of the box&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;nixos-infect is the safer bet at the low end. It doesn’t need to hold an entire installer OS in memory — it mutates the running system. The downside is you inherit whatever disk layout your provider gave you.&lt;&#x2F;p&gt;
&lt;p&gt;The custom kexec approach is better if you care about declarative partitioning or need a clean-slate install. But it’s cutting it close at 1GB. If the provider is generous with their RAM accounting and doesn’t have other processes eating into it, it works. If they overcommit heavily on the host side, you might still hit the wall.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;provider-gotchas&quot;&gt;Provider gotchas&lt;&#x2F;h2&gt;
&lt;p&gt;A few things to watch for on the LowEndBox-tier providers where this matters:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;&#x2F;tmp&lt;&#x2F;code&gt; as tmpfs.&lt;&#x2F;strong&gt; Already covered above, but worth repeating. If &lt;code&gt;df -h &#x2F;tmp&lt;&#x2F;code&gt; shows a tmpfs mount of 50–100MB, the &lt;code&gt;sed&lt;&#x2F;code&gt; patch for nixos-infect is mandatory.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Static networking.&lt;&#x2F;strong&gt; The vast majority of budget providers don’t use DHCP. &lt;code&gt;doNetConf=y&lt;&#x2F;code&gt; is not optional. Forget it once, learn the lesson permanently.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Swap.&lt;&#x2F;strong&gt; Some providers give you swap, some don’t. If your 1GB VPS has no swap and you’re trying the custom kexec route, you’re working without a safety net. Consider adding a swap file to your NixOS config as one of the first things you deploy.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;OpenVZ vs KVM.&lt;&#x2F;strong&gt; OpenVZ containers can’t kexec at all — the host kernel is shared. nixos-infect is your only option on OpenVZ, and even then it’s hit-or-miss depending on what the container exposes. KVM is the safe choice for NixOS.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;after-the-install&quot;&gt;After the install&lt;&#x2F;h2&gt;
&lt;p&gt;Whichever path you took, you now have a minimal NixOS system with SSH access. From here:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nixos-rebuild switch --flake .#my-vps --target-host root@your-vps
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Push your real configuration. Set up your services. The hard part — getting NixOS onto a box that wasn’t designed for it — is done. Everything from here is just regular NixOS.&lt;&#x2F;p&gt;
&lt;p&gt;Three dollars a month, 1GB of RAM, and a fully declarative operating system. Not bad for a provider that only officially supports Ubuntu.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Self-Hosting a Private Nix Binary Cache with Attic and Garage</title>
        <published>2026-03-31T12:00:00+00:00</published>
        <updated>2026-03-31T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/self-hosted-nix-binary-cache-attic-garage/"/>
        <id>https://perlpimp.net/blog/self-hosted-nix-binary-cache-attic-garage/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/self-hosted-nix-binary-cache-attic-garage/">&lt;p&gt;You want a private Nix binary cache. Not “throw artifacts into S3 and hope for the best” private. A real cache server, scoped credentials, multiple teams, and CI that pushes build outputs automatically. Basically the Cachix model, but self-hosted and under your control.&lt;&#x2F;p&gt;
&lt;p&gt;This stack does that cleanly:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Garage&lt;&#x2F;strong&gt; provides S3-compatible object storage for NAR files&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Attic&lt;&#x2F;strong&gt; provides the cache API, signing keys, cache metadata, and auth tokens&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;PostgreSQL&lt;&#x2F;strong&gt; stores Attic metadata&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;sops-nix&lt;&#x2F;strong&gt; wires in secrets without hardcoding credentials in your Nix config&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;GitHub Actions&lt;&#x2F;strong&gt; pushes build outputs into the cache on trusted builds&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The useful twist is Attic’s token model. You can hand a team one token scoped to &lt;code&gt;teamname-*&lt;&#x2F;code&gt;, and they can create and manage &lt;code&gt;teamname-dev&lt;&#x2F;code&gt;, &lt;code&gt;teamname-prod&lt;&#x2F;code&gt;, &lt;code&gt;teamname-whatever&lt;&#x2F;code&gt; without touching anyone else’s caches. That’s the self-hosted private Cachix pitch.&lt;&#x2F;p&gt;
&lt;p&gt;This post walks through the whole setup on NixOS. It does &lt;strong&gt;not&lt;&#x2F;strong&gt; cover per-cache vanity hostnames through nginx. Attic still exposes one global substituter endpoint, so path-based cache URLs are the model today.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-split-attic-and-garage&quot;&gt;Why split Attic and Garage&lt;&#x2F;h2&gt;
&lt;p&gt;Attic wants durable blob storage plus a relational database. Garage gives you the blob layer without needing MinIO, Ceph, or a cloud bucket. PostgreSQL handles Attic’s metadata. That separation matters:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;NAR files live in S3-compatible object storage&lt;&#x2F;li&gt;
&lt;li&gt;Cache definitions, ACLs, and token state live in PostgreSQL&lt;&#x2F;li&gt;
&lt;li&gt;The cache server stays stateless apart from its DB and signing key&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;For a single-node deployment, Garage with SQLite is enough. If you need more later, you can grow from there without replacing the cache server.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;garage-s3-backend-for-nar-storage&quot;&gt;Garage: S3 backend for NAR storage&lt;&#x2F;h2&gt;
&lt;p&gt;Garage is the object store Attic writes to. The NixOS side is straightforward: run Garage locally, expose the S3 API through nginx, and bootstrap the initial layout, S3 key, and bucket with an idempotent oneshot service.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{ config, pkgs, ... }:

let
  garageS3Host = &amp;quot;s3.example.com&amp;quot;;
  garageZone = &amp;quot;est1&amp;quot;;
  garageCapacity = &amp;quot;1T&amp;quot;;

  garageBootstrap = pkgs.writeShellScript &amp;quot;garage-bootstrap&amp;quot; &amp;#x27;&amp;#x27;
    set -euo pipefail

    for _ in $(seq 1 30); do
      if node_id=&amp;quot;$(${config.services.garage.package}&amp;#x2F;bin&amp;#x2F;garage node id 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null | cut -d@ -f1)&amp;quot; &amp;amp;&amp;amp; [ -n &amp;quot;$node_id&amp;quot; ]; then
        break
      fi
      sleep 1
    done

    if [ -z &amp;quot;&amp;#x27;&amp;#x27;${node_id:-}&amp;quot; ]; then
      echo &amp;quot;garage node id did not become available in time&amp;quot; &amp;gt;&amp;amp;2
      exit 1
    fi

    if ${config.services.garage.package}&amp;#x2F;bin&amp;#x2F;garage status | grep -q &amp;#x27;NO ROLE ASSIGNED&amp;#x27;; then
      ${config.services.garage.package}&amp;#x2F;bin&amp;#x2F;garage layout assign -z ${garageZone} -c ${garageCapacity} &amp;quot;$node_id&amp;quot;

      current_version=&amp;quot;$(${config.services.garage.package}&amp;#x2F;bin&amp;#x2F;garage layout show | sed -n &amp;#x27;s&amp;#x2F;^Current cluster layout version: \([0-9][0-9]*\)$&amp;#x2F;\1&amp;#x2F;p&amp;#x27;)&amp;quot;
      if [ -n &amp;quot;$current_version&amp;quot; ]; then
        next_version=$((current_version + 1))
      else
        next_version=1
      fi

      ${config.services.garage.package}&amp;#x2F;bin&amp;#x2F;garage layout apply --version &amp;quot;$next_version&amp;quot;
    fi

    if ! ${config.services.garage.package}&amp;#x2F;bin&amp;#x2F;garage key info &amp;quot;$(cat ${config.sops.secrets.garage_attic_key_id.path})&amp;quot; &amp;gt;&amp;#x2F;dev&amp;#x2F;null 2&amp;gt;&amp;amp;1; then
      ${config.services.garage.package}&amp;#x2F;bin&amp;#x2F;garage key import \
        &amp;quot;$(cat ${config.sops.secrets.garage_attic_key_id.path})&amp;quot; \
        &amp;quot;$(cat ${config.sops.secrets.garage_attic_secret_key.path})&amp;quot; \
        -n attic \
        --yes
    fi

    if ! ${config.services.garage.package}&amp;#x2F;bin&amp;#x2F;garage bucket info attic &amp;gt;&amp;#x2F;dev&amp;#x2F;null 2&amp;gt;&amp;amp;1; then
      ${config.services.garage.package}&amp;#x2F;bin&amp;#x2F;garage bucket create attic
    fi

    ${config.services.garage.package}&amp;#x2F;bin&amp;#x2F;garage bucket allow \
      --read \
      --write \
      --owner \
      attic \
      --key &amp;quot;$(cat ${config.sops.secrets.garage_attic_key_id.path})&amp;quot;
  &amp;#x27;&amp;#x27;;
in
{
  users.groups.garage = { };

  users.users.garage = {
    isSystemUser = true;
    group = &amp;quot;garage&amp;quot;;
    description = &amp;quot;Garage object storage service&amp;quot;;
  };

  sops.secrets.garage_rpc_secret = {
    mode = &amp;quot;0400&amp;quot;;
    owner = &amp;quot;garage&amp;quot;;
    group = &amp;quot;garage&amp;quot;;
  };

  sops.secrets.garage_admin_token = {
    mode = &amp;quot;0400&amp;quot;;
    owner = &amp;quot;garage&amp;quot;;
    group = &amp;quot;garage&amp;quot;;
  };

  sops.secrets.garage_metrics_token = {
    mode = &amp;quot;0400&amp;quot;;
    owner = &amp;quot;garage&amp;quot;;
    group = &amp;quot;garage&amp;quot;;
  };

  sops.secrets.garage_attic_key_id = { };
  sops.secrets.garage_attic_secret_key = { };

  services.garage = {
    enable = true;
    package = pkgs.garage;
    logLevel = &amp;quot;info&amp;quot;;

    settings = {
      replication_factor = 1;
      db_engine = &amp;quot;sqlite&amp;quot;;

      rpc_bind_addr = &amp;quot;127.0.0.1:3901&amp;quot;;
      rpc_public_addr = &amp;quot;127.0.0.1:3901&amp;quot;;
      rpc_secret_file = config.sops.secrets.garage_rpc_secret.path;

      allow_world_readable_secrets = false;

      s3_api = {
        api_bind_addr = &amp;quot;127.0.0.1:3900&amp;quot;;
        s3_region = &amp;quot;garage&amp;quot;;
      };

      admin = {
        api_bind_addr = &amp;quot;127.0.0.1:3903&amp;quot;;
        admin_token_file = config.sops.secrets.garage_admin_token.path;
        metrics_token_file = config.sops.secrets.garage_metrics_token.path;
      };
    };
  };

  systemd.services.garage.serviceConfig = {
    DynamicUser = false;
    User = &amp;quot;garage&amp;quot;;
    Group = &amp;quot;garage&amp;quot;;
  };

  services.nginx.virtualHosts.${garageS3Host} = {
    forceSSL = true;
    enableACME = true;
    acmeRoot = null;
    locations.&amp;quot;&amp;#x2F;&amp;quot; = {
      proxyPass = &amp;quot;http:&amp;#x2F;&amp;#x2F;127.0.0.1:3900&amp;quot;;
      extraConfig = &amp;#x27;&amp;#x27;
        client_max_body_size 0;
        proxy_request_buffering off;
        proxy_buffering off;
      &amp;#x27;&amp;#x27;;
    };
  };

  systemd.services.garage-bootstrap = {
    description = &amp;quot;Bootstrap Garage layout and Attic bucket&amp;quot;;
    wantedBy = [ &amp;quot;multi-user.target&amp;quot; ];
    after = [ &amp;quot;garage.service&amp;quot; ];
    requires = [ &amp;quot;garage.service&amp;quot; ];
    path = with pkgs; [
      coreutils
      gnugrep
      gnused
    ];
    serviceConfig = {
      Type = &amp;quot;oneshot&amp;quot;;
      RemainAfterExit = true;
      ExecStart = garageBootstrap;
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The important part is the bootstrap script. It makes the single-node layout assignment, imports the access key Attic will use, creates the &lt;code&gt;attic&lt;&#x2F;code&gt; bucket, and grants that key ownership of the bucket. Because it’s idempotent, you can leave it in the boot path without fear.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;attic-cache-server-metadata-and-tokens&quot;&gt;Attic: cache server, metadata, and tokens&lt;&#x2F;h2&gt;
&lt;p&gt;Attic sits in front of Garage and PostgreSQL. It serves the binary cache API, signs cache metadata, and issues scoped JWTs for pushing and administration.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{ config, pkgs, ... }:

let
  atticApiHost = &amp;quot;nix-cache.example.com&amp;quot;;
  atticBootstrapCache = &amp;quot;myuser&amp;quot;;
  atticClient = &amp;quot;${pkgs.attic-client}&amp;#x2F;bin&amp;#x2F;attic&amp;quot;;

  atticEnv = config.sops.templates.&amp;quot;atticd-env&amp;quot;;

  atticBootstrap = pkgs.writeShellScript &amp;quot;attic-bootstrap&amp;quot; &amp;#x27;&amp;#x27;
    set -euo pipefail

    state_root=&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;attic-bootstrap
    export HOME=&amp;quot;$state_root&amp;quot;
    export XDG_CONFIG_HOME=&amp;quot;$state_root&amp;#x2F;xdg&amp;quot;

    rm -rf &amp;quot;$XDG_CONFIG_HOME&amp;quot;
    mkdir -p &amp;quot;$XDG_CONFIG_HOME&amp;quot;

    token=$(&amp;#x2F;run&amp;#x2F;current-system&amp;#x2F;sw&amp;#x2F;bin&amp;#x2F;atticd-atticadm make-token \
      --sub bootstrap \
      --validity &amp;#x27;10 years&amp;#x27; \
      --pull &amp;#x27;*&amp;#x27; \
      --push &amp;#x27;*&amp;#x27; \
      --delete &amp;#x27;*&amp;#x27; \
      --create-cache &amp;#x27;*&amp;#x27; \
      --configure-cache &amp;#x27;*&amp;#x27; \
      --configure-cache-retention &amp;#x27;*&amp;#x27; \
      --destroy-cache &amp;#x27;*&amp;#x27;)

    ${atticClient} login bootstrap http:&amp;#x2F;&amp;#x2F;127.0.0.1:8080&amp;#x2F; &amp;quot;$token&amp;quot;

    if ! ${atticClient} cache info bootstrap:${atticBootstrapCache} &amp;gt;&amp;#x2F;dev&amp;#x2F;null 2&amp;gt;&amp;amp;1; then
      ${atticClient} cache create bootstrap:${atticBootstrapCache} --public
    fi

    ${atticClient} cache configure bootstrap:${atticBootstrapCache} --public
    ${atticClient} cache info bootstrap:${atticBootstrapCache} &amp;gt; &amp;quot;$state_root&amp;#x2F;${atticBootstrapCache}.info&amp;quot;
  &amp;#x27;&amp;#x27;;
in
{
  sops.secrets.attic_token_rs256_secret_base64 = {
    mode = &amp;quot;0400&amp;quot;;
    owner = &amp;quot;root&amp;quot;;
    group = &amp;quot;root&amp;quot;;
  };

  sops.templates.&amp;quot;atticd-env&amp;quot; = {
    content = &amp;#x27;&amp;#x27;
      ATTIC_SERVER_TOKEN_RS256_SECRET_BASE64=${config.sops.placeholder.attic_token_rs256_secret_base64}
      AWS_ACCESS_KEY_ID=${config.sops.placeholder.garage_attic_key_id}
      AWS_SECRET_ACCESS_KEY=${config.sops.placeholder.garage_attic_secret_key}
    &amp;#x27;&amp;#x27;;
    mode = &amp;quot;0400&amp;quot;;
    owner = &amp;quot;root&amp;quot;;
    group = &amp;quot;root&amp;quot;;
  };

  services.postgresql.ensureDatabases = [ &amp;quot;atticd&amp;quot; ];
  services.postgresql.ensureUsers = [
    {
      name = &amp;quot;atticd&amp;quot;;
      ensureDBOwnership = true;
    }
  ];

  services.atticd = {
    enable = true;
    package = pkgs.attic-server;
    environmentFile = atticEnv.path;
    settings = {
      listen = &amp;quot;127.0.0.1:8080&amp;quot;;
      allowed-hosts = [
        atticApiHost
        &amp;quot;${atticBootstrapCache}.${atticApiHost}&amp;quot;
        &amp;quot;127.0.0.1:8080&amp;quot;
        &amp;quot;localhost:8080&amp;quot;
      ];
      api-endpoint = &amp;quot;https:&amp;#x2F;&amp;#x2F;${atticApiHost}&amp;#x2F;&amp;quot;;
      substituter-endpoint = &amp;quot;https:&amp;#x2F;&amp;#x2F;${atticApiHost}&amp;#x2F;&amp;quot;;

      database.url = &amp;quot;postgres:&amp;#x2F;&amp;#x2F;&amp;#x2F;atticd?host=&amp;#x2F;run&amp;#x2F;postgresql&amp;amp;user=atticd&amp;quot;;

      storage = {
        type = &amp;quot;s3&amp;quot;;
        region = &amp;quot;garage&amp;quot;;
        bucket = &amp;quot;attic&amp;quot;;
        endpoint = &amp;quot;https:&amp;#x2F;&amp;#x2F;s3.example.com&amp;quot;;
      };
    };
  };

  systemd.services.atticd = {
    after = [ &amp;quot;garage-bootstrap.service&amp;quot; ];
    requires = [ &amp;quot;garage-bootstrap.service&amp;quot; ];
  };

  systemd.services.attic-bootstrap = {
    description = &amp;quot;Bootstrap the initial public Attic cache&amp;quot;;
    wantedBy = [ &amp;quot;multi-user.target&amp;quot; ];
    after = [ &amp;quot;atticd.service&amp;quot; ];
    requires = [ &amp;quot;atticd.service&amp;quot; ];
    path = with pkgs; [
      coreutils
      findutils
    ];
    serviceConfig = {
      Type = &amp;quot;oneshot&amp;quot;;
      RemainAfterExit = true;
      ExecStart = atticBootstrap;
      StateDirectory = &amp;quot;attic-bootstrap&amp;quot;;
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Three details matter here:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;services.atticd.settings.storage&lt;&#x2F;code&gt; points at Garage’s S3 endpoint, not local disk.&lt;&#x2F;li&gt;
&lt;li&gt;The Attic signing key and S3 credentials come from a &lt;code&gt;sops.templates&lt;&#x2F;code&gt; environment file, so the unit gets exactly the secrets it needs.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;attic-bootstrap&lt;&#x2F;code&gt; logs into the local API with an all-powerful bootstrap token and creates the first cache declaratively.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That last point is the difference between “deployed a service” and “deployed a usable cache.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-boot-dependency-chain&quot;&gt;The boot dependency chain&lt;&#x2F;h2&gt;
&lt;p&gt;The full boot ordering looks like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;garage.service
  └─ garage-bootstrap.service
       └─ atticd.service
            └─ attic-bootstrap.service
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Garage has to be up before the S3 bucket can be created. The bucket and access key have to exist before Attic can start cleanly. Attic has to be live before you can create the initial cache through its API. All of those bootstrap steps are &lt;code&gt;oneshot&lt;&#x2F;code&gt; units with &lt;code&gt;RemainAfterExit = true&lt;&#x2F;code&gt;, so they run once, stay “active”, and don’t flap on every check.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;secrets-with-sops-nix&quot;&gt;Secrets with sops-nix&lt;&#x2F;h2&gt;
&lt;p&gt;This stack has six secrets:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Secret&lt;&#x2F;th&gt;&lt;th&gt;Purpose&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;garage_rpc_secret&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Garage RPC authentication&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;garage_admin_token&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Garage admin API authentication&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;garage_metrics_token&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Garage metrics API authentication&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;garage_attic_key_id&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;S3 access key for Attic&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;garage_attic_secret_key&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;S3 secret key for Attic&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;attic_token_rs256_secret_base64&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Attic JWT signing key&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;The nice part is the direction of flow:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Garage consumes the RPC and admin secrets directly&lt;&#x2F;li&gt;
&lt;li&gt;Garage bootstrap consumes the S3 key pair to import it and grant bucket access&lt;&#x2F;li&gt;
&lt;li&gt;Attic consumes the JWT secret and the same S3 key pair through one rendered environment file&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;No plaintext credentials in git. No hand-written &lt;code&gt;Environment=&lt;&#x2F;code&gt; lines in systemd units.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;multi-tenant-token-scoping-is-the-whole-point&quot;&gt;Multi-tenant token scoping is the whole point&lt;&#x2F;h2&gt;
&lt;p&gt;If all you wanted was a single shared cache, you could stop at “CI can push and clients can pull.” The more interesting setup is when multiple teams or projects share one Attic server without sharing one namespace.&lt;&#x2F;p&gt;
&lt;p&gt;Attic’s token generator gives you that directly.&lt;&#x2F;p&gt;
&lt;p&gt;For one personal cache:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;atticd-atticadm make-token \
  --sub myuser-push \
  --validity &amp;#x27;1 year&amp;#x27; \
  --pull myuser \
  --push myuser
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That token can read and push exactly one cache: &lt;code&gt;myuser&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;For a team namespace:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;atticd-atticadm make-token \
  --sub teamname \
  --validity &amp;#x27;1 year&amp;#x27; \
  --pull &amp;#x27;teamname-*&amp;#x27; \
  --push &amp;#x27;teamname-*&amp;#x27; \
  --create-cache &amp;#x27;teamname-*&amp;#x27; \
  --configure-cache &amp;#x27;teamname-*&amp;#x27; \
  --configure-cache-retention &amp;#x27;teamname-*&amp;#x27;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is the private Cachix pattern. One token, one namespace prefix. The team can create &lt;code&gt;teamname-dev&lt;&#x2F;code&gt;, &lt;code&gt;teamname-staging&lt;&#x2F;code&gt;, and &lt;code&gt;teamname-prod&lt;&#x2F;code&gt; on demand. They can push to them, configure them, and rotate retention settings. They cannot touch &lt;code&gt;otherteam-*&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;That gives you one shared Attic deployment with isolation by scoped capability rather than by running one cache server per team.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;client-side-nix-configuration&quot;&gt;Client-side Nix configuration&lt;&#x2F;h2&gt;
&lt;p&gt;Clients only need the global Attic endpoint and the public key for the cache they consume:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{ ... }:

{
  nix.settings = {
    substituters = [
      &amp;quot;https:&amp;#x2F;&amp;#x2F;nix-cache.example.com&amp;#x2F;myuser&amp;quot;
    ];
    trusted-public-keys = [
      &amp;quot;myuser:55EJTBFbq5pCYx2tf+aR8pmVPvCmP7QlafHH90&amp;#x2F;kikw=&amp;quot;
    ];
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That’s valid for NixOS and nix-darwin alike. If you have multiple team caches, add multiple path-based substituters and their public keys.&lt;&#x2F;p&gt;
&lt;p&gt;The thing to remember is that the cache name lives in the path. Attic does not currently give you a first-class “one hostname per cache” model through &lt;code&gt;attic use&lt;&#x2F;code&gt;, so don’t design around vanity subdomains here.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;ci-push-trusted-build-outputs-never-pr-outputs&quot;&gt;CI: push trusted build outputs, never PR outputs&lt;&#x2F;h2&gt;
&lt;p&gt;The cache gets interesting when CI writes to it automatically. The safe pattern is:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;pull from the cache on every build&lt;&#x2F;li&gt;
&lt;li&gt;push to the cache only on trusted events&lt;&#x2F;li&gt;
&lt;li&gt;keep pull requests read-only to avoid cache poisoning&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Here’s a GitHub Actions workflow that does exactly that:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;yaml&quot; class=&quot;language-yaml &quot;&gt;&lt;code class=&quot;language-yaml&quot; data-lang=&quot;yaml&quot;&gt;name: &amp;quot;Build and populate cache&amp;quot;
on:
  pull_request:
  push:
  workflow_dispatch:
  schedule:
    - cron: &amp;#x27;42 5 * * *&amp;#x27;

jobs:
  tests:
    strategy:
      matrix:
        nurRepo:
          - myuser
        nixPath:
          - nixpkgs=channel:nixos-unstable
          - nixpkgs=channel:nixpkgs-unstable
    runs-on: ubuntu-latest
    steps:
      - name: Checkout repository
        uses: actions&amp;#x2F;checkout@v4

      - name: Install nix
        uses: cachix&amp;#x2F;install-nix-action@v31
        with:
          nix_path: &amp;quot;${{ matrix.nixPath }}&amp;quot;
          extra_nix_config: |
            experimental-features = nix-command flakes
            access-tokens = github.com=${{ secrets.GITHUB_TOKEN }}
            extra-substituters = https:&amp;#x2F;&amp;#x2F;nix-cache.example.com&amp;#x2F;myuser
            extra-trusted-public-keys = myuser:55EJTBFbq5pCYx2tf+aR8pmVPvCmP7QlafHH90&amp;#x2F;kikw=

      - name: Show nixpkgs version
        run: nix-instantiate --eval -E &amp;#x27;(import &amp;lt;nixpkgs&amp;gt; {}).lib.version&amp;#x27;

      - name: Login to Attic
        if: github.event_name != &amp;#x27;pull_request&amp;#x27;
        run: nix shell nixpkgs#attic-client -c attic login ci https:&amp;#x2F;&amp;#x2F;nix-cache.example.com&amp;#x2F; &amp;quot;${{ secrets.ATTIC_TOKEN }}&amp;quot;

      - name: Build nix packages
        run: nix shell nixpkgs#nix-build-uncached -c nix-build-uncached ci.nix -A cacheOutputs

      - name: Push build outputs to Attic
        if: github.event_name != &amp;#x27;pull_request&amp;#x27;
        run: nix shell nixpkgs#attic-client -c sh -lc &amp;#x27;attic push ci:myuser result*&amp;#x27;

      - name: Trigger NUR update
        if: ${{ matrix.nurRepo != &amp;#x27;&amp;lt;YOUR_REPO_NAME&amp;gt;&amp;#x27; }}
        run: curl -XPOST &amp;quot;https:&amp;#x2F;&amp;#x2F;nur-update.nix-community.org&amp;#x2F;update?repo=${{ matrix.nurRepo }}&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The key behavior:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;PRs can use the cache as a substituter, but they do not get an Attic login token&lt;&#x2F;li&gt;
&lt;li&gt;pushes, scheduled builds, and manual runs can log in and push results&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ATTIC_TOKEN&lt;&#x2F;code&gt; should be a scoped token, not a bootstrap token&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;nix-build-uncached&lt;&#x2F;code&gt; avoids pointlessly rebuilding outputs that are already available from substituters&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;If you’re handing out CI tokens per repository or per team, use the same wildcard scoping model as the human tokens. &lt;code&gt;repo-a-*&lt;&#x2F;code&gt; and &lt;code&gt;repo-b-*&lt;&#x2F;code&gt; stay isolated even though they hit the same server.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;operating-model&quot;&gt;Operating model&lt;&#x2F;h2&gt;
&lt;p&gt;Once this is deployed, the workflow is simple:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Garage stores the blobs.&lt;&#x2F;li&gt;
&lt;li&gt;Attic serves the cache API and tracks metadata in PostgreSQL.&lt;&#x2F;li&gt;
&lt;li&gt;Teams get scoped tokens limited to their namespace.&lt;&#x2F;li&gt;
&lt;li&gt;Developers add the cache URL and public key to &lt;code&gt;nix.settings&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;CI logs in on trusted runs and pushes build outputs.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That gives you most of what people actually want from Cachix:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;faster CI&lt;&#x2F;li&gt;
&lt;li&gt;faster local builds&lt;&#x2F;li&gt;
&lt;li&gt;a shared cache for private code&lt;&#x2F;li&gt;
&lt;li&gt;scoped write access&lt;&#x2F;li&gt;
&lt;li&gt;one central service instead of ad-hoc per-project buckets&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;But you keep control of the storage, keys, and tenancy model.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;one-limitation-worth-planning-around&quot;&gt;One limitation worth planning around&lt;&#x2F;h2&gt;
&lt;p&gt;Attic still exposes one global &lt;code&gt;substituter-endpoint&lt;&#x2F;code&gt;. &lt;code&gt;attic use &amp;lt;cache&amp;gt;&lt;&#x2F;code&gt; advertises a path-based URL like &lt;code&gt;https:&#x2F;&#x2F;nix-cache.example.com&#x2F;myuser&lt;&#x2F;code&gt;, not a dedicated hostname per cache. So if you’re thinking “I’ll front each cache with its own vanity nginx vhost,” stop there. That’s not the product surface Attic exposes today.&lt;&#x2F;p&gt;
&lt;p&gt;Path-based cache names are the stable interface. Design your client config, CI config, and team docs around that and the setup stays boring in the good way.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;final-thought&quot;&gt;Final thought&lt;&#x2F;h2&gt;
&lt;p&gt;This stack hits a nice balance. Garage is lightweight enough for single-node self-hosting. Attic gives you a real multi-user cache server instead of raw object storage. PostgreSQL handles the metadata cleanly. And scoped tokens let you share one service across teams without turning it into a free-for-all.&lt;&#x2F;p&gt;
&lt;p&gt;If you already run NixOS and sops-nix, the whole thing fits naturally into the rest of your infrastructure. Which is the main reason to do it this way in the first place.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Nix-Wrapped Android Release Builds with Out-of-Repo Signing</title>
        <published>2026-03-30T12:00:00+00:00</published>
        <updated>2026-03-30T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/nix-wrapped-android-release-builds/"/>
        <id>https://perlpimp.net/blog/nix-wrapped-android-release-builds/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/nix-wrapped-android-release-builds/">&lt;p&gt;Android release builds have three annoyances that like to get tangled together:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;The Android SDK is huge, version-sensitive, and annoying to install consistently.&lt;&#x2F;li&gt;
&lt;li&gt;Release signing needs a keystore and credentials that absolutely should not live in the repo.&lt;&#x2F;li&gt;
&lt;li&gt;Flutter build commands tend to accrete tribal shell setup until nobody remembers what is actually required.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Nix solves the first and third problems nicely. For the second, the pattern I like is simple: keep signing material under &lt;code&gt;~&#x2F;.config&#x2F;your-app&#x2F;&lt;&#x2F;code&gt;, and have the flake symlink it into the place Gradle expects at build time.&lt;&#x2F;p&gt;
&lt;p&gt;That gives you a release workflow where:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;the SDK version is pinned&lt;&#x2F;li&gt;
&lt;li&gt;the build command is repeatable&lt;&#x2F;li&gt;
&lt;li&gt;the signing config stays outside git&lt;&#x2F;li&gt;
&lt;li&gt;new machines bootstrap from &lt;code&gt;nix develop&lt;&#x2F;code&gt; instead of a README full of imperative setup&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Here’s the pattern.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-shape-of-it&quot;&gt;The shape of it&lt;&#x2F;h2&gt;
&lt;p&gt;The repo stays clean. Secrets live in your home directory:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;~&amp;#x2F;.config&amp;#x2F;myapp&amp;#x2F;
├── android-signing.properties
└── upload.jks

myapp&amp;#x2F;
├── flake.nix
├── android&amp;#x2F;
│   ├── .gitignore
│   └── app&amp;#x2F;build.gradle.kts
└── ...
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;android-signing.properties&lt;&#x2F;code&gt; contains the passwords and points at the keystore using an absolute path. The repo never stores either file. The build script links the properties file into &lt;code&gt;android&#x2F;key.properties&lt;&#x2F;code&gt; right before running the release build.&lt;&#x2F;p&gt;
&lt;p&gt;That one decision removes most of the risk. You stop playing games with encrypted blobs in app repos, and Gradle still gets the file layout it expects.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-1-pin-the-android-sdk-in-the-flake&quot;&gt;Step 1: pin the Android SDK in the flake&lt;&#x2F;h2&gt;
&lt;p&gt;The core idea is to compose the Android SDK declaratively in &lt;code&gt;flake.nix&lt;&#x2F;code&gt;, then export the usual Android environment variables from the dev shell:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;devShells = forAllSystems (system:
  let
    pkgs = import nixpkgs {
      inherit system;
      config = {
        allowUnfree = true;
        android_sdk.accept_license = true;
      };
    };

    androidComposition =
      pkgs.androidenv.composeAndroidPackages {
        buildToolsVersions = [ &amp;quot;28.0.3&amp;quot; &amp;quot;35.0.0&amp;quot; ];
        platformVersions = [ &amp;quot;36&amp;quot; &amp;quot;35&amp;quot; &amp;quot;34&amp;quot; &amp;quot;33&amp;quot; ];
        includeNDK = true;
        ndkVersions = [ &amp;quot;28.2.13676358&amp;quot; ];
        cmakeVersions = [ &amp;quot;3.22.1&amp;quot; ];
        includeEmulator = true;
        includeSystemImages = true;
        abiVersions = [ &amp;quot;arm64-v8a&amp;quot; &amp;quot;x86_64&amp;quot; ];
        systemImageTypes = [ &amp;quot;google_apis&amp;quot; ];
        includeSources = false;
      };

    androidSdk = androidComposition.androidsdk;
  in {
    default = pkgs.mkShellNoCC {
      packages = [
        pkgs.flutter
        androidSdk
        pkgs.jdk17
        release-android
      ];

      shellHook = &amp;#x27;&amp;#x27;
        export ANDROID_HOME=&amp;quot;${androidSdk}&amp;#x2F;libexec&amp;#x2F;android-sdk&amp;quot;
        export ANDROID_SDK_ROOT=&amp;quot;$ANDROID_HOME&amp;quot;
        export JAVA_HOME=&amp;quot;${pkgs.jdk17}&amp;quot;
      &amp;#x27;&amp;#x27;;
    };
  });
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The important bits:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;android_sdk.accept_license = true&lt;&#x2F;code&gt; has to be set in &lt;code&gt;config&lt;&#x2F;code&gt;, or the SDK composition fails&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ANDROID_HOME&lt;&#x2F;code&gt;, &lt;code&gt;ANDROID_SDK_ROOT&lt;&#x2F;code&gt;, and &lt;code&gt;JAVA_HOME&lt;&#x2F;code&gt; should all be exported explicitly&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;buildToolsVersions&lt;&#x2F;code&gt;, &lt;code&gt;platformVersions&lt;&#x2F;code&gt;, &lt;code&gt;ndkVersions&lt;&#x2F;code&gt;, and &lt;code&gt;cmakeVersions&lt;&#x2F;code&gt; should be pinned instead of left implicit&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;This is the whole point of using Nix here. You are replacing “install whatever Android Studio pulls today” with a concrete, reviewable SDK definition.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-2-keep-signing-config-outside-the-repo&quot;&gt;Step 2: keep signing config outside the repo&lt;&#x2F;h2&gt;
&lt;p&gt;Under &lt;code&gt;~&#x2F;.config&#x2F;myapp&#x2F;&lt;&#x2F;code&gt;, create a properties file like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;properties&quot; class=&quot;language-properties &quot;&gt;&lt;code class=&quot;language-properties&quot; data-lang=&quot;properties&quot;&gt;storePassword=your-store-password
keyPassword=your-key-password
keyAlias=upload
storeFile=&amp;#x2F;Users&amp;#x2F;yourname&amp;#x2F;.config&amp;#x2F;myapp&amp;#x2F;upload.jks
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then generate the keystore once:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;keytool -genkey -v -keystore ~&amp;#x2F;.config&amp;#x2F;myapp&amp;#x2F;upload.jks \
  -keyalg RSA -keysize 2048 -validity 10000 -alias upload
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Two details matter here:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;storeFile&lt;&#x2F;code&gt; should be an absolute path&lt;&#x2F;li&gt;
&lt;li&gt;both files should live somewhere user-private, not under the project tree&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;You can move these wherever you like, but &lt;code&gt;~&#x2F;.config&#x2F;myapp&#x2F;&lt;&#x2F;code&gt; is an easy convention to remember and document.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-3-let-gradle-read-key-properties-if-it-exists&quot;&gt;Step 3: let Gradle read &lt;code&gt;key.properties&lt;&#x2F;code&gt; if it exists&lt;&#x2F;h2&gt;
&lt;p&gt;On the Gradle side, the app should load &lt;code&gt;android&#x2F;key.properties&lt;&#x2F;code&gt; when present and define the release signing config from it.&lt;&#x2F;p&gt;
&lt;p&gt;In &lt;code&gt;android&#x2F;app&#x2F;build.gradle.kts&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;kotlin&quot; class=&quot;language-kotlin &quot;&gt;&lt;code class=&quot;language-kotlin&quot; data-lang=&quot;kotlin&quot;&gt;import java.io.FileInputStream
import java.util.Properties

val keystoreProperties = Properties()
val keystorePropertiesFile = rootProject.file(&amp;quot;key.properties&amp;quot;)

if (keystorePropertiesFile.exists()) {
    keystoreProperties.load(FileInputStream(keystorePropertiesFile))
}

android {
    signingConfigs {
        if (keystorePropertiesFile.exists()) {
            create(&amp;quot;release&amp;quot;) {
                keyAlias = keystoreProperties[&amp;quot;keyAlias&amp;quot;] as String
                keyPassword = keystoreProperties[&amp;quot;keyPassword&amp;quot;] as String
                storeFile = file(keystoreProperties[&amp;quot;storeFile&amp;quot;] as String)
                storePassword = keystoreProperties[&amp;quot;storePassword&amp;quot;] as String
            }
        }
    }

    buildTypes {
        release {
            signingConfig = if (keystorePropertiesFile.exists()) {
                signingConfigs.getByName(&amp;quot;release&amp;quot;)
            } else {
                signingConfigs.getByName(&amp;quot;debug&amp;quot;)
            }
        }
    }
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And in &lt;code&gt;android&#x2F;.gitignore&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;gitignore&quot; class=&quot;language-gitignore &quot;&gt;&lt;code class=&quot;language-gitignore&quot; data-lang=&quot;gitignore&quot;&gt;key.properties
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The fallback to &lt;code&gt;debug&lt;&#x2F;code&gt; signing is optional, but useful. It means the project still builds for local testing even when the real release credentials are absent.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-4-wrap-the-release-build-in-nix&quot;&gt;Step 4: wrap the release build in Nix&lt;&#x2F;h2&gt;
&lt;p&gt;Now the useful part: hide the shell setup and symlink dance behind a single command.&lt;&#x2F;p&gt;
&lt;p&gt;I like doing this with &lt;code&gt;writeShellScriptBin&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;release-android =
  pkgs.writeShellScriptBin &amp;quot;release-android&amp;quot; &amp;#x27;&amp;#x27;
    set -euo pipefail

    export ANDROID_HOME=&amp;quot;${androidSdk}&amp;#x2F;libexec&amp;#x2F;android-sdk&amp;quot;
    export ANDROID_SDK_ROOT=&amp;quot;$ANDROID_HOME&amp;quot;
    export JAVA_HOME=&amp;quot;${pkgs.jdk17}&amp;quot;

    root=&amp;quot;$(${pkgs.git}&amp;#x2F;bin&amp;#x2F;git rev-parse --show-toplevel 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null || pwd)&amp;quot;
    signing=&amp;quot;$HOME&amp;#x2F;.config&amp;#x2F;myapp&amp;#x2F;android-signing.properties&amp;quot;

    if [ ! -f &amp;quot;$signing&amp;quot; ]; then
      echo &amp;quot;error: signing config not found&amp;quot; &amp;gt;&amp;amp;2
      echo &amp;gt;&amp;amp;2
      echo &amp;quot;Create $signing with:&amp;quot; &amp;gt;&amp;amp;2
      echo &amp;quot;  storePassword=&amp;lt;your store password&amp;gt;&amp;quot; &amp;gt;&amp;amp;2
      echo &amp;quot;  keyPassword=&amp;lt;your key password&amp;gt;&amp;quot; &amp;gt;&amp;amp;2
      echo &amp;quot;  keyAlias=upload&amp;quot; &amp;gt;&amp;amp;2
      echo &amp;quot;  storeFile=&amp;#x2F;absolute&amp;#x2F;path&amp;#x2F;to&amp;#x2F;keystore.jks&amp;quot; &amp;gt;&amp;amp;2
      exit 1
    fi

    ln -sf &amp;quot;$signing&amp;quot; &amp;quot;$root&amp;#x2F;android&amp;#x2F;key.properties&amp;quot;
    echo &amp;quot;linked signing config&amp;quot;

    cd &amp;quot;$root&amp;quot;
    flutter clean
    flutter pub get
    flutter build appbundle --release \
      --dart-define=GRPC_HOST=api.prod.example.com \
      --dart-define=GRPC_PORT=443 \
      --dart-define=GRPC_USE_TLS=true \
      &amp;quot;$@&amp;quot;

    echo
    echo &amp;quot;Bundle: build&amp;#x2F;app&amp;#x2F;outputs&amp;#x2F;bundle&amp;#x2F;release&amp;#x2F;app-release.aab&amp;quot;
  &amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This script does four things:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Exports the Android and Java paths expected by Flutter and Gradle.&lt;&#x2F;li&gt;
&lt;li&gt;Verifies that the signing config exists before doing any expensive work.&lt;&#x2F;li&gt;
&lt;li&gt;Symlinks the external properties file into &lt;code&gt;android&#x2F;key.properties&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;Runs the release build with whatever &lt;code&gt;--dart-define&lt;&#x2F;code&gt; values production needs.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That is a much better interface than “first open the right shell, then export three variables, then remember the path to the signing file, then run the exact build incantation.”&lt;&#x2F;p&gt;
&lt;p&gt;It also means CI or another developer can run the same command and get the same behaviour, assuming they have their own signing material configured.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-5-add-emulator-helpers-while-you-re-here&quot;&gt;Step 5: add emulator helpers while you’re here&lt;&#x2F;h2&gt;
&lt;p&gt;Once you’re already pinning the SDK, adding emulator launchers is cheap and useful.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s a small helper that creates an AVD only if it doesn’t already exist, then launches it:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;abi = if pkgs.stdenv.hostPlatform.isAarch64
  then &amp;quot;arm64-v8a&amp;quot;
  else &amp;quot;x86_64&amp;quot;;

mkEmulator = { name, avdName, device, apiLevel, desc }:
  pkgs.writeShellScriptBin name &amp;#x27;&amp;#x27;
    set -euo pipefail

    export ANDROID_HOME=&amp;quot;${androidSdk}&amp;#x2F;libexec&amp;#x2F;android-sdk&amp;quot;
    export ANDROID_SDK_ROOT=&amp;quot;$ANDROID_HOME&amp;quot;
    export PATH=&amp;quot;${pkgs.jdk17}&amp;#x2F;bin:$PATH&amp;quot;

    AVD_NAME=&amp;quot;${avdName}&amp;quot;
    SYSTEM_IMAGE=&amp;quot;system-images;android-${apiLevel};google_apis;${abi}&amp;quot;

    if ! &amp;quot;$ANDROID_HOME&amp;#x2F;emulator&amp;#x2F;emulator&amp;quot; -list-avds 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null \
         | grep -qx &amp;quot;$AVD_NAME&amp;quot;; then
      echo &amp;quot;Creating ${desc}...&amp;quot;
      echo &amp;quot;no&amp;quot; | &amp;quot;$ANDROID_HOME&amp;quot;&amp;#x2F;cmdline-tools&amp;#x2F;*&amp;#x2F;bin&amp;#x2F;avdmanager \
        create avd -n &amp;quot;$AVD_NAME&amp;quot; -k &amp;quot;$SYSTEM_IMAGE&amp;quot; \
        -d &amp;quot;${device}&amp;quot; --force
    fi

    echo &amp;quot;Launching ${desc}...&amp;quot;
    exec &amp;quot;$ANDROID_HOME&amp;#x2F;emulator&amp;#x2F;emulator&amp;quot; -avd &amp;quot;$AVD_NAME&amp;quot; &amp;quot;$@&amp;quot;
  &amp;#x27;&amp;#x27;;

emu-phone = mkEmulator {
  name = &amp;quot;emu-phone&amp;quot;;
  avdName = &amp;quot;myapp_phone&amp;quot;;
  device = &amp;quot;pixel_7&amp;quot;;
  apiLevel = &amp;quot;35&amp;quot;;
  desc = &amp;quot;Pixel 7 API 35&amp;quot;;
};

emu-tablet = mkEmulator {
  name = &amp;quot;emu-tablet&amp;quot;;
  avdName = &amp;quot;myapp_tablet&amp;quot;;
  device = &amp;quot;pixel_tablet&amp;quot;;
  apiLevel = &amp;quot;34&amp;quot;;
  desc = &amp;quot;Pixel Tablet API 34&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The nice thing here is not just convenience. It’s that the emulator definition becomes part of the same reproducible environment as the SDK itself. Team members are not manually creating mystery AVDs with slightly different images and wondering why behaviour differs.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-usage-looks-like&quot;&gt;What usage looks like&lt;&#x2F;h2&gt;
&lt;p&gt;From a clean machine, the workflow becomes:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Enter the project environment
nix develop

# Run an emulator
emu-phone

# Build the signed Android app bundle
release-android
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The result lands at:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;text&quot; class=&quot;language-text &quot;&gt;&lt;code class=&quot;language-text&quot; data-lang=&quot;text&quot;&gt;build&amp;#x2F;app&amp;#x2F;outputs&amp;#x2F;bundle&amp;#x2F;release&amp;#x2F;app-release.aab
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That’s the artifact you upload to Google Play.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-this-pattern-is-worth-keeping&quot;&gt;Why this pattern is worth keeping&lt;&#x2F;h2&gt;
&lt;p&gt;The main win is not “you can build Android with Nix.” You can do that a dozen messy ways. The win is that responsibilities get separated cleanly:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Nix owns SDK provisioning and wrapper scripts&lt;&#x2F;li&gt;
&lt;li&gt;Gradle owns signing configuration and release build logic&lt;&#x2F;li&gt;
&lt;li&gt;your home directory owns secrets&lt;&#x2F;li&gt;
&lt;li&gt;the repo stays free of machine-local credentials&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That separation scales better than the usual alternative, where every developer has a slightly different Android setup and the release process is half shell history, half folklore.&lt;&#x2F;p&gt;
&lt;p&gt;If you’re already using flakes for your dev environment, Android release builds fit into that model surprisingly well. Put the SDK in the flake, put the signing material under &lt;code&gt;~&#x2F;.config&lt;&#x2F;code&gt;, wrap the build in one script, and stop treating release day as a special machine ceremony.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Prometheus Ecowitt Exporter on NixOS</title>
        <published>2026-03-29T12:00:00+00:00</published>
        <updated>2026-03-29T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/prometheus-ecowitt-exporter-nixos/"/>
        <id>https://perlpimp.net/blog/prometheus-ecowitt-exporter-nixos/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/prometheus-ecowitt-exporter-nixos/">&lt;p&gt;Ecowitt weather stations are solid hardware. The sensors are reliable, the gateways are cheap, and the ecosystem covers everything from soil moisture to lightning detection. The cloud platform, though — that’s where it falls apart. Limited retention, clunky dashboards, no alerting worth mentioning, and your data sitting on someone else’s servers. You already run Prometheus and Grafana for everything else. You just need a bridge.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;prometheus-ecowitt-exporter&quot;&gt;prometheus-ecowitt-exporter&lt;&#x2F;a&gt; is that bridge. It’s a Rust service that receives HTTP POSTs from your Ecowitt gateway, converts the readings into Prometheus metrics, and exposes them on a &lt;code&gt;&#x2F;metrics&lt;&#x2F;code&gt; endpoint. It can also forward the raw data to other HTTP endpoints — Home Assistant, a second exporter, anything that accepts an HTTP POST — working around the gateway’s single-destination limitation. It ships with a NixOS module that wires everything up — including optional Prometheus scrape config and a Grafana dashboard.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-the-data-flows&quot;&gt;How the data flows&lt;&#x2F;h2&gt;
&lt;p&gt;The path from sensor to graph looks like this:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;Ecowitt sensors
  ↓ RF 433&amp;#x2F;868MHz
Gateway (GW1100, etc.)
  ↓ HTTP POST
prometheus-ecowitt-exporter ──→ Home Assistant, other receivers
  ↓ &amp;#x2F;metrics
Prometheus
  ↓ query
Grafana
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Your sensors transmit over RF to the gateway. The gateway — typically a GW1100 or similar — supports “customized” weather services, which is really just an HTTP POST to an arbitrary endpoint on an interval you choose. The exporter receives those posts, parses the Ecowitt protocol, applies unit conversions, and serves the results as Prometheus metrics. It can also forward the raw data to other HTTP endpoints — more on that below. Standard scrape-and-graph from there.&lt;&#x2F;p&gt;
&lt;p&gt;No polling. No API keys. No cloud dependency. The gateway pushes directly to your host.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;getting-the-module&quot;&gt;Getting the module&lt;&#x2F;h2&gt;
&lt;p&gt;You have two options for pulling this into your NixOS configuration: directly from the flake, or through NUR.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;direct-flake-input&quot;&gt;Direct flake input&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs = {
    nixpkgs.url = &amp;quot;github:NixOS&amp;#x2F;nixpkgs&amp;#x2F;nixos-unstable&amp;quot;;
    prometheus-ecowitt-exporter.url = &amp;quot;github:ijohanne&amp;#x2F;prometheus-ecowitt-exporter&amp;quot;;
  };

  outputs = { nixpkgs, prometheus-ecowitt-exporter, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      modules = [
        prometheus-ecowitt-exporter.nixosModules.default
        .&amp;#x2F;configuration.nix
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This pins you to a specific revision of the exporter. You control when you update. Straightforward.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;via-nur&quot;&gt;Via NUR&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs.nur.url = &amp;quot;github:nix-community&amp;#x2F;NUR&amp;quot;;

  outputs = { self, nixpkgs, nur, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      modules = [
        nur.modules.nixos.repos.ijohanne.prometheus-ecowitt-exporter
        .&amp;#x2F;configuration.nix
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Same module, different delivery mechanism. If you already use NUR for other packages, this keeps your inputs list shorter.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;nixos-configuration&quot;&gt;NixOS configuration&lt;&#x2F;h2&gt;
&lt;p&gt;Here’s a full configuration with all the knobs visible:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  services.prometheus-ecowitt-exporter = {
    enable = true;
    port = 8088;
    temperatureUnit = &amp;quot;c&amp;quot;;
    pressureUnit = &amp;quot;hpa&amp;quot;;
    windUnit = &amp;quot;kmh&amp;quot;;
    rainUnit = &amp;quot;mm&amp;quot;;
    distanceUnit = &amp;quot;km&amp;quot;;
    irradianceUnit = &amp;quot;wm2&amp;quot;;
    aqiStandard = &amp;quot;epa&amp;quot;;
    outdoorLocation = &amp;quot;garden&amp;quot;;
    indoorLocation = &amp;quot;living-room&amp;quot;;
    forwardUrls = [
      &amp;quot;http:&amp;#x2F;&amp;#x2F;homeassistant.local:8123&amp;#x2F;api&amp;#x2F;webhook&amp;#x2F;ecowitt&amp;quot;
    ];
    enableLocalScraping = true;
    enableGrafanaDashboard = true;
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Two options deserve special attention. &lt;code&gt;enableLocalScraping&lt;&#x2F;code&gt; automatically adds a Prometheus scrape target for the exporter — no need to manually edit your &lt;code&gt;scrapeConfigs&lt;&#x2F;code&gt;. &lt;code&gt;enableGrafanaDashboard&lt;&#x2F;code&gt; provisions a Grafana dashboard that covers all the metric types the exporter produces. Both save you the tedium of wiring things together by hand.&lt;&#x2F;p&gt;
&lt;p&gt;The service runs as a dedicated &lt;code&gt;ecowitt-exporter&lt;&#x2F;code&gt; user by default. You can override &lt;code&gt;user&lt;&#x2F;code&gt; and &lt;code&gt;group&lt;&#x2F;code&gt; if you have opinions about that. The &lt;code&gt;listenAddress&lt;&#x2F;code&gt; defaults to &lt;code&gt;0.0.0.0&lt;&#x2F;code&gt; — change it if you want to bind to a specific interface.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;unit-conversions&quot;&gt;Unit conversions&lt;&#x2F;h2&gt;
&lt;p&gt;The exporter handles all unit conversion server-side. You configure it once and every metric comes out in the units you actually want.&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Measurement&lt;&#x2F;th&gt;&lt;th&gt;Default&lt;&#x2F;th&gt;&lt;th&gt;Options&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;Temperature&lt;&#x2F;td&gt;&lt;td&gt;c&lt;&#x2F;td&gt;&lt;td&gt;c, f, k&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Pressure&lt;&#x2F;td&gt;&lt;td&gt;hpa&lt;&#x2F;td&gt;&lt;td&gt;hpa, inhg, mmhg&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Wind&lt;&#x2F;td&gt;&lt;td&gt;kmh&lt;&#x2F;td&gt;&lt;td&gt;kmh, mph, ms, knots, fps&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Rain&lt;&#x2F;td&gt;&lt;td&gt;mm&lt;&#x2F;td&gt;&lt;td&gt;mm, in&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Distance&lt;&#x2F;td&gt;&lt;td&gt;km&lt;&#x2F;td&gt;&lt;td&gt;km, mi&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;Irradiance&lt;&#x2F;td&gt;&lt;td&gt;wm2&lt;&#x2F;td&gt;&lt;td&gt;wm2, lx, fc&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;AQI Standard&lt;&#x2F;td&gt;&lt;td&gt;epa&lt;&#x2F;td&gt;&lt;td&gt;uk, epa, mep, nepm&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;p&gt;The AQI standard option is worth noting — it changes the algorithm used to calculate the air quality index from PM2.5 readings. UK, EPA, Chinese MEP, and Australian NEPM standards are all available. Pick whichever one your local regulatory body uses.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;multi-station-support&quot;&gt;Multi-station support&lt;&#x2F;h2&gt;
&lt;p&gt;Each Ecowitt gateway posts to a URL path you configure, and that path segment becomes the &lt;code&gt;station&lt;&#x2F;code&gt; label on every metric:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;POST &amp;#x2F;report&amp;#x2F;garden    → station=&amp;quot;garden&amp;quot;
POST &amp;#x2F;report&amp;#x2F;rooftop   → station=&amp;quot;rooftop&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;One exporter instance handles multiple stations. No duplicate deployments, no port juggling, no metric collisions. You just point each gateway at a different path.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;forwarding-to-other-receivers&quot;&gt;Forwarding to other receivers&lt;&#x2F;h2&gt;
&lt;p&gt;Ecowitt gateways only support a single custom server destination. That’s a problem if you want data in both Prometheus and Home Assistant — or any other system that speaks the Ecowitt protocol. The exporter solves this by acting as a fan-out relay.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;forwardUrls&lt;&#x2F;code&gt; option takes a list of URLs. Every time the exporter receives a POST from the gateway, it forwards the raw request body to each URL. The forwarding is fire-and-forget — the exporter doesn’t wait for responses and won’t block metric processing if a downstream receiver is slow or unreachable.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.prometheus-ecowitt-exporter = {
  forwardUrls = [
    &amp;quot;http:&amp;#x2F;&amp;#x2F;homeassistant.local:8123&amp;#x2F;api&amp;#x2F;webhook&amp;#x2F;ecowitt&amp;quot;
    &amp;quot;https:&amp;#x2F;&amp;#x2F;backup-exporter.lan:8088&amp;#x2F;report&amp;#x2F;garden&amp;quot;
  ];
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Self-signed certificates are accepted — TLS verification is disabled on the forwarding client, so you don’t need to mess with CA bundles for internal services.&lt;&#x2F;p&gt;
&lt;p&gt;On the CLI, pass &lt;code&gt;--forward-url&lt;&#x2F;code&gt; once per destination:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;prometheus-ecowitt-exporter \
  --forward-url http:&amp;#x2F;&amp;#x2F;homeassistant.local:8123&amp;#x2F;api&amp;#x2F;webhook&amp;#x2F;ecowitt \
  --forward-url https:&amp;#x2F;&amp;#x2F;other-server:8088&amp;#x2F;report&amp;#x2F;mystation
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The exporter tracks forwarding health via two Prometheus counters: &lt;code&gt;ecowitt_forward_total&lt;&#x2F;code&gt; and &lt;code&gt;ecowitt_forward_errors&lt;&#x2F;code&gt;, both labeled by destination URL. The Grafana dashboard includes a “Forwarding” panel that shows the rate of each per URL, so you’ll know immediately if a downstream receiver starts failing.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;sensor-location-labels&quot;&gt;Sensor location labels&lt;&#x2F;h2&gt;
&lt;p&gt;Multi-channel temperature and humidity sensors — the WN31, WH31, and similar — show up as numbered channels. Channel numbers are not particularly descriptive in a dashboard. The &lt;code&gt;tempLocations&lt;&#x2F;code&gt; option maps them to human-readable labels:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.prometheus-ecowitt-exporter = {
  tempLocations = {
    &amp;quot;1&amp;quot; = &amp;quot;greenhouse&amp;quot;;
    &amp;quot;2&amp;quot; = &amp;quot;garage&amp;quot;;
    &amp;quot;3&amp;quot; = &amp;quot;pool&amp;quot;;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;These strings become the &lt;code&gt;location&lt;&#x2F;code&gt; label on &lt;code&gt;ecowitt_temp&lt;&#x2F;code&gt; and &lt;code&gt;ecowitt_humidity&lt;&#x2F;code&gt; metrics. Your Grafana panels say “greenhouse” instead of “channel 1.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;configuring-the-weather-station-gateway&quot;&gt;Configuring the weather station gateway&lt;&#x2F;h2&gt;
&lt;p&gt;The gateway needs to know where to send its data. You configure this through the WSView Plus app on your phone:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Open WSView Plus and go to Device List&lt;&#x2F;li&gt;
&lt;li&gt;Select your gateway&lt;&#x2F;li&gt;
&lt;li&gt;Navigate to Weather Services, then Customized&lt;&#x2F;li&gt;
&lt;li&gt;Set Enable to On&lt;&#x2F;li&gt;
&lt;li&gt;Set Protocol Type to Ecowitt&lt;&#x2F;li&gt;
&lt;li&gt;Enter your NixOS host’s IP address as the Server IP&#x2F;Hostname&lt;&#x2F;li&gt;
&lt;li&gt;Set the Path to &lt;code&gt;&#x2F;report&#x2F;mystationname&lt;&#x2F;code&gt; — pick a name that makes sense as a Prometheus label&lt;&#x2F;li&gt;
&lt;li&gt;Set Port to &lt;code&gt;8088&lt;&#x2F;code&gt; (or whatever you configured)&lt;&#x2F;li&gt;
&lt;li&gt;Set Upload Interval to 60 seconds&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That last setting controls how often you get new data points. Sixty seconds is a reasonable default. You can go lower, but your weather doesn’t change that fast.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-metrics-you-get&quot;&gt;What metrics you get&lt;&#x2F;h2&gt;
&lt;p&gt;The exporter produces a comprehensive set of metrics. Here’s what each family covers:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ecowitt_temp&lt;&#x2F;code&gt; — temperature readings from outdoor, indoor, and up to eight multi-channel sensors, with &lt;code&gt;station&lt;&#x2F;code&gt;, &lt;code&gt;sensor&lt;&#x2F;code&gt;, &lt;code&gt;unit&lt;&#x2F;code&gt;, and &lt;code&gt;location&lt;&#x2F;code&gt; labels&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ecowitt_humidity&lt;&#x2F;code&gt; — outdoor, indoor, and eight channels&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ecowitt_pressure&lt;&#x2F;code&gt; — barometric pressure in both absolute and relative variants, plus vapor pressure deficit&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ecowitt_windspeed&lt;&#x2F;code&gt; and &lt;code&gt;ecowitt_windspeed_beaufort&lt;&#x2F;code&gt; — current speed, gust, max daily gust, and wind direction&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ecowitt_rain&lt;&#x2F;code&gt; — hourly, daily, weekly, monthly, yearly, and event totals for both standard and WS90 piezo rain sensors&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ecowitt_solar_radiation&lt;&#x2F;code&gt; and &lt;code&gt;ecowitt_uv_index&lt;&#x2F;code&gt; — solar metrics&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ecowitt_lightning_distance&lt;&#x2F;code&gt; and &lt;code&gt;ecowitt_lightning_count&lt;&#x2F;code&gt; — lightning detection data&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ecowitt_pm25&lt;&#x2F;code&gt; and &lt;code&gt;ecowitt_aqi&lt;&#x2F;code&gt; — PM2.5 concentration with 24-hour averages, plus calculated AQI under your chosen standard&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ecowitt_soil_moisture&lt;&#x2F;code&gt; — multiple channels with raw voltage readings&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ecowitt_battery&lt;&#x2F;code&gt; — status, voltage, and level for every sensor type&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;ecowitt_forward_total&lt;&#x2F;code&gt; and &lt;code&gt;ecowitt_forward_errors&lt;&#x2F;code&gt; — counters tracking forwarded requests and failures per destination URL&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Every metric carries appropriate labels so you can filter and aggregate in PromQL without ambiguity.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;grafana-dashboard&quot;&gt;Grafana dashboard&lt;&#x2F;h2&gt;
&lt;p&gt;When you set &lt;code&gt;enableGrafanaDashboard = true&lt;&#x2F;code&gt;, the module provisions a Grafana dashboard automatically through Grafana’s provisioning system. It covers all the metric families listed above with sensible panel layouts. You get graphs, gauges, and stat panels — the kind of thing that would take an afternoon to build by hand.&lt;&#x2F;p&gt;
&lt;p&gt;If you want to customize it, the provisioned dashboard is a starting point. Clone it in Grafana and edit from there.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;docker-alternative&quot;&gt;Docker alternative&lt;&#x2F;h2&gt;
&lt;p&gt;Not on NixOS? The exporter builds as a standard Docker image:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;docker build -t prometheus-ecowitt-exporter .
docker run -p 8088:8088 prometheus-ecowitt-exporter \
  --temperature-unit c --pressure-unit hpa
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;You lose the automatic Prometheus and Grafana integration, but the exporter itself works the same way. Point your gateway at the container’s IP and port, configure your Prometheus scrape target manually, and import a dashboard.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;testing-with-curl&quot;&gt;Testing with curl&lt;&#x2F;h2&gt;
&lt;p&gt;You don’t need a weather station connected to verify the exporter is working. Fake a gateway POST:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;curl -X POST http:&amp;#x2F;&amp;#x2F;localhost:8088&amp;#x2F;report&amp;#x2F;test \
  -d &amp;quot;tempf=72.5&amp;amp;humidity=45&amp;amp;baromrelin=29.92&amp;amp;windspeedmph=5.6&amp;amp;dailyrainin=0.02&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then check the metrics endpoint:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;curl http:&amp;#x2F;&amp;#x2F;localhost:8088&amp;#x2F;metrics
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;You should see Prometheus-formatted metrics with the values you posted — converted to whatever units you configured. This is also useful for testing alerting rules before the next thunderstorm.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;supported-hardware&quot;&gt;Supported hardware&lt;&#x2F;h2&gt;
&lt;p&gt;The exporter handles data from the full Ecowitt ecosystem:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;WS2910&lt;&#x2F;strong&gt; — weather station&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;GW1100&lt;&#x2F;strong&gt; — Wi-Fi gateway&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;WS69 and WS90&lt;&#x2F;strong&gt; — sensor arrays&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;WH41 and WH43&lt;&#x2F;strong&gt; — PM2.5 air quality sensors&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;WH57&lt;&#x2F;strong&gt; — lightning detection sensor&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;WN31, WH31, WN32, WH32&lt;&#x2F;strong&gt; — temperature and humidity sensors&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;WN36&lt;&#x2F;strong&gt; — pool temperature sensor&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;WH51&lt;&#x2F;strong&gt; — soil moisture meter&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;If your sensor transmits to an Ecowitt gateway and the gateway can do a customized HTTP POST, the exporter will handle it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping up&lt;&#x2F;h2&gt;
&lt;p&gt;The whole setup is a flake input, a module configuration, and a phone app setting. Your weather data stays local, updates every minute, and sits in the same Prometheus instance as the rest of your infrastructure metrics. You can alert on freezing temperatures the same way you alert on disk space. And if you need the data elsewhere — Home Assistant, a backup exporter, whatever — the forwarding feature turns the single-destination limitation of Ecowitt gateways into a non-issue. The &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;prometheus-ecowitt-exporter&quot;&gt;project is on GitHub&lt;&#x2F;a&gt; under MIT — contributions welcome.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Crane: Nix Builds for Rust Without Losing cargo build</title>
        <published>2026-03-28T12:00:00+00:00</published>
        <updated>2026-03-28T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/crane-dual-workflow-nix-rust/"/>
        <id>https://perlpimp.net/blog/crane-dual-workflow-nix-rust/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/crane-dual-workflow-nix-rust/">&lt;p&gt;Half your team uses Nix. The other half just wants &lt;code&gt;cargo build&lt;&#x2F;code&gt;. Both are right.&lt;&#x2F;p&gt;
&lt;p&gt;The default way Nix builds Rust is painful. &lt;code&gt;buildRustPackage&lt;&#x2F;code&gt; hashes your dependency tree with &lt;code&gt;cargoHash&lt;&#x2F;code&gt;, which means every time you add a crate, you get a hash mismatch, go look up the new one, paste it in, and rebuild everything from scratch. It’s the kind of workflow that makes people mass-quit Nix.&lt;&#x2F;p&gt;
&lt;p&gt;Crane fixes this. It splits the build into two derivations — one for dependencies, one for your code — so changing a source file doesn’t re-download and re-compile 400 crates. And it doesn’t touch your &lt;code&gt;Cargo.toml&lt;&#x2F;code&gt; or project layout, so &lt;code&gt;cargo build&lt;&#x2F;code&gt; keeps working exactly as before.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-flake-skeleton&quot;&gt;The flake skeleton&lt;&#x2F;h2&gt;
&lt;p&gt;Start with the inputs. Crane for the build, rust-overlay for toolchain control, pre-commit-hooks for formatting, flake-utils because you don’t want to write &lt;code&gt;system&lt;&#x2F;code&gt; four times.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs = {
    nixpkgs.url = &amp;quot;github:NixOS&amp;#x2F;nixpkgs&amp;#x2F;nixpkgs-unstable&amp;quot;;
    crane.url = &amp;quot;github:ipetkov&amp;#x2F;crane&amp;quot;;
    rust-overlay = {
      url = &amp;quot;github:oxalica&amp;#x2F;rust-overlay&amp;quot;;
      inputs.nixpkgs.follows = &amp;quot;nixpkgs&amp;quot;;
    };
    pre-commit-hooks = {
      url = &amp;quot;github:cachix&amp;#x2F;pre-commit-hooks.nix&amp;quot;;
      inputs.nixpkgs.follows = &amp;quot;nixpkgs&amp;quot;;
    };
    flake-utils.url = &amp;quot;github:numtide&amp;#x2F;flake-utils&amp;quot;;
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Nothing unusual here. The &lt;code&gt;follows&lt;&#x2F;code&gt; declarations keep the dependency tree shallow.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;toolchain-and-crane-setup&quot;&gt;Toolchain and Crane setup&lt;&#x2F;h2&gt;
&lt;p&gt;rust-overlay gives you pinned toolchains without fighting nixpkgs versions. You pick stable, add the extensions you actually want, and hand it to Crane.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;rustToolchain = pkgs.rust-bin.stable.latest.default.override {
  extensions = [ &amp;quot;rust-src&amp;quot; &amp;quot;clippy&amp;quot; &amp;quot;rustfmt&amp;quot; ];
};
craneLib = (crane.mkLib pkgs).overrideToolchain rustToolchain;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;rust-src&lt;&#x2F;code&gt; is there for rust-analyzer. Without it, go-to-definition into std just gives you a wall of “source not available.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-two-stage-build&quot;&gt;The two-stage build&lt;&#x2F;h2&gt;
&lt;p&gt;This is the core pattern. Crane splits your build into a dependency layer and a source layer.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;src = craneLib.cleanCargoSource .&amp;#x2F;.;

commonArgs = {
  inherit src;
  strictDeps = true;
  nativeBuildInputs = with pkgs; [ protobuf ];
  PROTOC = &amp;quot;${pkgs.protobuf}&amp;#x2F;bin&amp;#x2F;protoc&amp;quot;;
};

cargoArtifacts = craneLib.buildDepsOnly commonArgs;

my-service = craneLib.buildPackage (commonArgs &amp;#x2F;&amp;#x2F; {
  inherit cargoArtifacts;
});
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;buildDepsOnly&lt;&#x2F;code&gt; compiles every dependency in &lt;code&gt;Cargo.lock&lt;&#x2F;code&gt; and caches the result. &lt;code&gt;buildPackage&lt;&#x2F;code&gt; then takes those cached artifacts and only compiles your code. Change a line in &lt;code&gt;src&#x2F;main.rs&lt;&#x2F;code&gt; and you skip the entire dependency build.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;cleanCargoSource&lt;&#x2F;code&gt; strips non-Cargo files from the source — READMEs, CI configs, docs. This matters because Nix derivations rebuild when their inputs change. Without it, editing your README invalidates your build cache.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;PROTOC&lt;&#x2F;code&gt; env var is there because protobuf codegen needs to find the compiler. If your project doesn’t use protobuf, drop both lines. The pattern is the same for any native dependency — declare it in &lt;code&gt;nativeBuildInputs&lt;&#x2F;code&gt;, set env vars if needed.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;private-git-dependencies&quot;&gt;Private git dependencies&lt;&#x2F;h2&gt;
&lt;p&gt;Things get interesting when your &lt;code&gt;Cargo.toml&lt;&#x2F;code&gt; pulls crates from private repos. Crane calls &lt;code&gt;builtins.fetchGit&lt;&#x2F;code&gt; under the hood, which works fine on single-user Nix installs — your SSH keys are right there. On multi-user installs, including Determinate Nix, the nix daemon runs as its own user and has no access to your keys.&lt;&#x2F;p&gt;
&lt;p&gt;The fix is to pre-fetch private deps as flake inputs — the flake machinery handles authentication — and then tell Crane’s vendoring to use those pre-fetched sources instead of trying to fetch them itself.&lt;&#x2F;p&gt;
&lt;p&gt;First, add the dep as a &lt;code&gt;flake = false&lt;&#x2F;code&gt; input:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs = {
    my-private-dep-src = {
      url = &amp;quot;github:myorg&amp;#x2F;my-private-dep&amp;quot;;
      flake = false;
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then override the vendoring:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;cargoVendorDir = craneLib.vendorCargoDeps {
  src = craneLib.cleanCargoSource .&amp;#x2F;.;
  overrideVendorGitCheckout = lockMetadata: drv:
    let
      source = (builtins.head lockMetadata).source or &amp;quot;&amp;quot;;
    in
    if builtins.match &amp;quot;.*my-private-dep.*&amp;quot; source != null then
      drv.overrideAttrs (_: {
        src = my-private-dep-src;
      })
    else
      drv;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then thread it into your build args:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;commonArgs = {
  inherit src cargoVendorDir;
  strictDeps = true;
  # ...
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;the-gotchas&quot;&gt;The gotchas&lt;&#x2F;h3&gt;
&lt;p&gt;There are four things that will waste your afternoon if you don’t know them upfront.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;lockMetadata&lt;&#x2F;code&gt; is a list, not an attrset.&lt;&#x2F;strong&gt; The callback receives a list of lock entries. You need &lt;code&gt;(builtins.head lockMetadata).source&lt;&#x2F;code&gt; to get the git URL. Treating it as an attrset gives you an unhelpful type error deep in the Nix evaluator.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Match on &lt;code&gt;source&lt;&#x2F;code&gt;, not on the input name.&lt;&#x2F;strong&gt; If you have multiple private git deps and write a blanket override, you’ll apply the wrong source to the wrong dep. The &lt;code&gt;source&lt;&#x2F;code&gt; field contains the git URL from &lt;code&gt;Cargo.lock&lt;&#x2F;code&gt; — match against it.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;The &lt;code&gt;else drv&lt;&#x2F;code&gt; fallback is essential.&lt;&#x2F;strong&gt; Without it, any git dependency that doesn’t match your condition returns &lt;code&gt;null&lt;&#x2F;code&gt;. Crane then silently produces a broken vendor directory. You’ll get a cargo error about missing crates with no indication that your Nix code is the problem.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Use &lt;code&gt;drv.overrideAttrs&lt;&#x2F;code&gt;, not a fresh derivation.&lt;&#x2F;strong&gt; Crane’s vendor checkout derivation carries metadata that the rest of the pipeline depends on. Replacing it with &lt;code&gt;pkgs.runCommand&lt;&#x2F;code&gt; or similar drops that metadata and breaks downstream.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-devshell-cargo-build-just-works&quot;&gt;The devShell — cargo build just works&lt;&#x2F;h2&gt;
&lt;p&gt;Crane provides &lt;code&gt;craneLib.devShell&lt;&#x2F;code&gt;, which drops you into a shell with the Rust toolchain and any extra packages you specify. Inside it, &lt;code&gt;cargo build&lt;&#x2F;code&gt;, &lt;code&gt;cargo test&lt;&#x2F;code&gt;, &lt;code&gt;cargo run&lt;&#x2F;code&gt; — all work exactly as they would outside Nix.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;devShells.default = craneLib.devShell {
  checks = self.checks.${system};
  packages = with pkgs; [ protobuf sqlite cargo-watch ];
  PROTOC = &amp;quot;${pkgs.protobuf}&amp;#x2F;bin&amp;#x2F;protoc&amp;quot;;
  shellHook = &amp;#x27;&amp;#x27;
    ${self.checks.${system}.pre-commit.shellHook}
  &amp;#x27;&amp;#x27;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;checks&lt;&#x2F;code&gt; attribute wires up &lt;code&gt;nix flake check&lt;&#x2F;code&gt; outputs so they’re available in the shell. &lt;code&gt;packages&lt;&#x2F;code&gt; adds tools that aren’t Rust but that your project needs. The &lt;code&gt;shellHook&lt;&#x2F;code&gt; installs pre-commit hooks automatically — more on that below.&lt;&#x2F;p&gt;
&lt;p&gt;This is the key insight: the devShell gives Nix developers the standard Cargo workflow. They don’t run &lt;code&gt;nix build&lt;&#x2F;code&gt; during development. They run &lt;code&gt;cargo build&lt;&#x2F;code&gt;. Nix just sets up the environment.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;direnv-makes-it-transparent&quot;&gt;direnv makes it transparent&lt;&#x2F;h2&gt;
&lt;p&gt;The entire &lt;code&gt;.envrc&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;use flake
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That’s it. Walk into the project directory and your shell has the right Rust version, protobuf, sqlite, whatever else is declared in the devShell. No &lt;code&gt;nix develop&lt;&#x2F;code&gt;, no manual activation. It just happens.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-non-nix-escape-hatch&quot;&gt;The non-Nix escape hatch&lt;&#x2F;h2&gt;
&lt;p&gt;For developers who don’t have Nix installed, you need an alternative path. The approach that works: document the environment variables your devShell sets and provide a setup script that installs the same tools via rustup, brew, apt, whatever your team uses.&lt;&#x2F;p&gt;
&lt;p&gt;The shared interface is environment variables. &lt;code&gt;PROTOC&lt;&#x2F;code&gt; points at the protobuf compiler. &lt;code&gt;DATABASE_URL&lt;&#x2F;code&gt; points at the dev database. Nix sets these in the devShell; non-Nix developers set them manually or via a &lt;code&gt;setup.sh&lt;&#x2F;code&gt;. The Rust code doesn’t care where they came from.&lt;&#x2F;p&gt;
&lt;p&gt;This is what “dual workflow” actually means. Not two build systems — one codebase that works with or without Nix, with the build system being orthogonal to the development experience.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;local-path-overrides-and-the-trap&quot;&gt;Local path overrides and the trap&lt;&#x2F;h2&gt;
&lt;p&gt;When you’re iterating on a private dependency locally, you don’t want to push, update the flake input, and rebuild. Cargo has path overrides for this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;cargo build --config \
  &amp;#x27;patch.&amp;quot;https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;myorg&amp;#x2F;my-dep.git&amp;quot;.my-dep.path=&amp;quot;..&amp;#x2F;my-dep&amp;quot;&amp;#x27;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This tells Cargo to use your local checkout of &lt;code&gt;my-dep&lt;&#x2F;code&gt; instead of the git version. Works great for development.&lt;&#x2F;p&gt;
&lt;p&gt;The trap: never persist this in &lt;code&gt;.cargo&#x2F;config.toml&lt;&#x2F;code&gt;. It works on your machine because &lt;code&gt;..&#x2F;my-dep&lt;&#x2F;code&gt; exists. Inside the Nix sandbox, that path doesn’t exist. Your &lt;code&gt;nix build&lt;&#x2F;code&gt; will fail with a confusing “path not found” error that has nothing to do with Nix — it’s Cargo looking for a directory that isn’t there.&lt;&#x2F;p&gt;
&lt;p&gt;Keep it on the command line. Or use a &lt;code&gt;.cargo&#x2F;config.toml&lt;&#x2F;code&gt; that’s in &lt;code&gt;.gitignore&lt;&#x2F;code&gt;. Either way, don’t commit it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;pre-commit-hooks&quot;&gt;Pre-commit hooks&lt;&#x2F;h2&gt;
&lt;p&gt;Formatting arguments are a waste of everyone’s time. Automate them.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;pre-commit = pre-commit-hooks.lib.${system}.run {
  src = .&amp;#x2F;.;
  hooks = {
    nixfmt.enable = true;
    rustfmt.enable = true;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This runs &lt;code&gt;nixfmt&lt;&#x2F;code&gt; on &lt;code&gt;.nix&lt;&#x2F;code&gt; files and &lt;code&gt;rustfmt&lt;&#x2F;code&gt; on Rust files before every commit. The &lt;code&gt;shellHook&lt;&#x2F;code&gt; in the devShell installs the git hooks automatically, so Nix developers get them without thinking about it.&lt;&#x2F;p&gt;
&lt;p&gt;Non-Nix developers can install the same hooks via &lt;code&gt;pre-commit install&lt;&#x2F;code&gt; with a &lt;code&gt;.pre-commit-config.yaml&lt;&#x2F;code&gt; — same tools, different entry point.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;flake-checks-everything-in-one-command&quot;&gt;Flake checks — everything in one command&lt;&#x2F;h2&gt;
&lt;p&gt;Wire up your checks so &lt;code&gt;nix flake check&lt;&#x2F;code&gt; runs the build, Clippy, and format verification in one shot.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;checks = {
  inherit my-service;
  inherit pre-commit;
  my-service-clippy = craneLib.cargoClippy (commonArgs &amp;#x2F;&amp;#x2F; {
    inherit cargoArtifacts;
    cargoClippyExtraArgs = &amp;quot;--all-targets -- -D warnings&amp;quot;;
  });
  my-service-fmt = craneLib.cargoFmt { inherit src; };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;inherit my-service&lt;&#x2F;code&gt; means the build itself is a check — if it doesn’t compile, the check fails. &lt;code&gt;cargoClippy&lt;&#x2F;code&gt; runs lints with &lt;code&gt;-D warnings&lt;&#x2F;code&gt; so any warning is a hard failure. &lt;code&gt;cargoFmt&lt;&#x2F;code&gt; verifies formatting without modifying files.&lt;&#x2F;p&gt;
&lt;p&gt;In CI, your entire pipeline is &lt;code&gt;nix flake check&lt;&#x2F;code&gt;. One command, fully hermetic, same result on every machine.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;clippy-pedantic&quot;&gt;Clippy pedantic&lt;&#x2F;h2&gt;
&lt;p&gt;While you’re setting up lints, turn on Clippy’s pedantic preset. It catches things the default lints miss — redundant clones, needless borrows, missing docs on public items.&lt;&#x2F;p&gt;
&lt;p&gt;In your &lt;code&gt;Cargo.toml&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;toml&quot; class=&quot;language-toml &quot;&gt;&lt;code class=&quot;language-toml&quot; data-lang=&quot;toml&quot;&gt;[lints.clippy]
pedantic = { level = &amp;quot;warn&amp;quot;, priority = -1 }
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;priority = -1&lt;&#x2F;code&gt; means pedantic lints are warnings, but you can still override individual lints at higher priority. Some pedantic lints are genuinely annoying — &lt;code&gt;module_name_repetitions&lt;&#x2F;code&gt; comes to mind — and you’ll want to &lt;code&gt;#[allow]&lt;&#x2F;code&gt; those. But the majority catch real issues.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-result&quot;&gt;The result&lt;&#x2F;h2&gt;
&lt;p&gt;You end up with a Rust project that builds with &lt;code&gt;cargo build&lt;&#x2F;code&gt; for daily development and &lt;code&gt;nix build&lt;&#x2F;code&gt; for CI and deployment. The Cargo workflow is untouched — no wrapper scripts, no custom build commands, no “run this Nix thing instead.” Developers who don’t use Nix never need to know it’s there.&lt;&#x2F;p&gt;
&lt;p&gt;The Nix side gives you hermetic CI, binary caching, and NixOS deployment modules. Crane’s two-stage build means your CI isn’t re-compiling 400 crates on every push. And when something breaks, it breaks the same way on every machine, which — if you’ve spent enough time debugging “works on my laptop” — is actually a feature.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Running a Private OCI Registry on NixOS with Zot</title>
        <published>2026-03-27T12:00:00+00:00</published>
        <updated>2026-03-27T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/private-oci-registry-nixos-zot/"/>
        <id>https://perlpimp.net/blog/private-oci-registry-nixos-zot/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/private-oci-registry-nixos-zot/">&lt;p&gt;Every container you push to Docker Hub is one &lt;code&gt;docker pull&lt;&#x2F;code&gt; rate limit away from ruining your Friday night deployment. You could pay for a hosted registry, but you already have perfectly good hardware sitting in a rack. What you need is something lightweight, OCI-native, and not beholden to anyone’s pricing page.&lt;&#x2F;p&gt;
&lt;p&gt;Zot fits the bill. It’s a single-binary, vendor-neutral OCI registry that speaks the distribution spec natively. No wrapping Docker’s registry in duct tape. No Java. It ships with a built-in UI, search, vulnerability scanning, and Prometheus metrics. And there’s a NixOS module that makes the whole thing declarative.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;adding-the-module&quot;&gt;Adding the module&lt;&#x2F;h2&gt;
&lt;p&gt;The Zot NixOS module lives in &lt;code&gt;ijohanne&#x2F;nur-packages&lt;&#x2F;code&gt;. You can pull it in two ways.&lt;&#x2F;p&gt;
&lt;p&gt;Through the NUR overlay:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs.nur.url = &amp;quot;github:nix-community&amp;#x2F;NUR&amp;quot;;

  outputs = { self, nixpkgs, nur, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      modules = [
        nur.modules.nixos.repos.ijohanne.zot
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Or as a direct flake input — useful if you don’t want the full NUR:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs.nur-packages.url = &amp;quot;github:ijohanne&amp;#x2F;nur-packages&amp;quot;;

  outputs = { self, nixpkgs, nur-packages, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      modules = [
        nur-packages.nixosModules.zot
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Either way, you get the same module. Pick whichever matches how you already manage inputs.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;basic-setup&quot;&gt;Basic setup&lt;&#x2F;h2&gt;
&lt;p&gt;The minimum viable registry is surprisingly short:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.zot = {
  enable = true;
  dataDir = &amp;quot;&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;zot&amp;quot;;
  user = &amp;quot;zot&amp;quot;;
  group = &amp;quot;zot&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That gets you Zot running on port 5000 with sane defaults — distribution spec 1.1.1, garbage collection on a 1-hour delay with a 6-hour interval, search, UI, scrub, metrics, and lint extensions all enabled. CVE database updates every 2 hours, scrub runs every 24 hours.&lt;&#x2F;p&gt;
&lt;p&gt;But a registry on localhost port 5000 isn’t useful to anyone. You want TLS, a real domain, and authentication. The module handles all of that.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;nginx-reverse-proxy&quot;&gt;Nginx reverse proxy&lt;&#x2F;h2&gt;
&lt;p&gt;The module includes an nginx integration that does the right things:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.zot.nginx = {
  enable = true;
  domain = &amp;quot;registry.example.com&amp;quot;;
  forceSSL = true;
  acme = true;
  acmeDns01 = false;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This generates an nginx virtual host with Let’s Encrypt certificates, sets &lt;code&gt;client_max_body_size&lt;&#x2F;code&gt; to 0 — because you don’t want nginx rejecting your 2 GB container images — disables proxy buffering, and enables chunked transfer encoding. It also blocks the &lt;code&gt;&#x2F;metrics&lt;&#x2F;code&gt; endpoint with a 403, which matters later when you set up monitoring.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;users-and-authentication&quot;&gt;Users and authentication&lt;&#x2F;h2&gt;
&lt;p&gt;Zot uses htpasswd for authentication. The module manages it declaratively, regenerating the htpasswd file on every service start. Passwords come from files, which makes this sops-nix friendly:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.zot.auth.users = {
  admin = {
    passwordFile = config.sops.secrets.zot_admin_password.path;
    admin = true;
  };
  ci-user = {
    passwordFile = config.sops.secrets.zot_ci_password.path;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;No plaintext passwords in your Nix config. The &lt;code&gt;passwordFile&lt;&#x2F;code&gt; attribute points to a file containing the raw password — sops-nix decrypts it at activation time, and the module reads it at service start. The &lt;code&gt;admin = true&lt;&#x2F;code&gt; flag gives the user elevated privileges in the access control system.&lt;&#x2F;p&gt;
&lt;p&gt;One thing to note: if &lt;code&gt;metrics.enable&lt;&#x2F;code&gt; is true — which it is by default — the module automatically creates a Prometheus scraping user. You don’t need to add one manually.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;access-control&quot;&gt;Access control&lt;&#x2F;h2&gt;
&lt;p&gt;Authentication tells you &lt;em&gt;who&lt;&#x2F;em&gt; someone is. Access control tells you &lt;em&gt;what they can do&lt;&#x2F;em&gt;. The module supports fine-grained, per-repository policies using glob patterns:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.zot.accessControl = {
  adminActions = [ &amp;quot;read&amp;quot; &amp;quot;create&amp;quot; &amp;quot;update&amp;quot; &amp;quot;delete&amp;quot; ];
  defaultPolicy = [];
  anonymousPolicy = [];
  repositories.&amp;quot;myorg&amp;#x2F;**&amp;quot; = {
    policies = [{
      users = [ &amp;quot;ci-user&amp;quot; ];
      actions = [ &amp;quot;read&amp;quot; &amp;quot;create&amp;quot; &amp;quot;update&amp;quot; &amp;quot;delete&amp;quot; ];
    }];
    defaultPolicy = [];
    anonymousPolicy = [ &amp;quot;read&amp;quot; ];
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This configuration locks things down. The &lt;code&gt;defaultPolicy&lt;&#x2F;code&gt; and &lt;code&gt;anonymousPolicy&lt;&#x2F;code&gt; at the top level are empty — no access unless explicitly granted. Admin users get full read&#x2F;create&#x2F;update&#x2F;delete. Under &lt;code&gt;myorg&#x2F;**&lt;&#x2F;code&gt;, &lt;code&gt;ci-user&lt;&#x2F;code&gt; gets full access and anonymous users can pull.&lt;&#x2F;p&gt;
&lt;p&gt;That glob pattern is doing the heavy lifting. Everything under &lt;code&gt;myorg&#x2F;&lt;&#x2F;code&gt; — &lt;code&gt;myorg&#x2F;frontend&lt;&#x2F;code&gt;, &lt;code&gt;myorg&#x2F;api&lt;&#x2F;code&gt;, &lt;code&gt;myorg&#x2F;worker&lt;&#x2F;code&gt; — matches the same policy. You can add as many repository blocks as you need, each with its own set of users and permissions.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;retention-policies&quot;&gt;Retention policies&lt;&#x2F;h2&gt;
&lt;p&gt;Registries accumulate images. Without retention policies, your disk fills up. The module makes garbage collection declarative:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.zot.retention = {
  dryRun = false;
  delay = &amp;quot;24h&amp;quot;;
  policies = [
    {
      repositories = [ &amp;quot;myorg&amp;#x2F;**&amp;quot; ];
      deleteReferrers = true;
      deleteUntagged = true;
      keepTags = [
        { patterns = [ &amp;quot;latest&amp;quot; ]; }
        { pushedWithin = &amp;quot;168h&amp;quot;; }
        { mostRecentlyPushedCount = 10; }
        { patterns = [ &amp;quot;v.*&amp;quot; ]; pulledWithin = &amp;quot;720h&amp;quot;; }
      ];
    }
  ];
  defaultPolicy = {
    deleteReferrers = false;
    deleteUntagged = true;
    keepTags = [
      { patterns = [ &amp;quot;.*&amp;quot; ]; }
    ];
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Read the &lt;code&gt;keepTags&lt;&#x2F;code&gt; list as a set of survival conditions. An image in &lt;code&gt;myorg&#x2F;**&lt;&#x2F;code&gt; stays if it’s tagged &lt;code&gt;latest&lt;&#x2F;code&gt;, was pushed in the last 7 days, is one of the 10 most recently pushed, or matches a semver pattern and was pulled in the last 30 days. Everything else gets cleaned up after 24 hours.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;defaultPolicy&lt;&#x2F;code&gt; at the bottom is the fallback — it keeps everything tagged and deletes untagged images. Conservative, but it stops the obvious leak.&lt;&#x2F;p&gt;
&lt;p&gt;Set &lt;code&gt;dryRun = true&lt;&#x2F;code&gt; first and check the logs before you let it actually delete anything.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;monitoring-on-a-single-server&quot;&gt;Monitoring on a single server&lt;&#x2F;h2&gt;
&lt;p&gt;If Prometheus and Grafana run on the same machine as your registry, the module does everything for you:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.zot = {
  metrics.enable = true;
  metrics.user = &amp;quot;prometheus&amp;quot;;
  metrics.password = &amp;quot;prometheus&amp;quot;;
  enableLocalScraping = true;
  grafanaDashboard = true;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;enableLocalScraping&lt;&#x2F;code&gt; adds a Prometheus scrape config pointed at Zot’s metrics endpoint on localhost. &lt;code&gt;grafanaDashboard&lt;&#x2F;code&gt; provisions a Grafana dashboard automatically. Metrics are enabled by default, so you technically only need the last two lines.&lt;&#x2F;p&gt;
&lt;p&gt;The auto-created Prometheus user authenticates against Zot’s htpasswd, so scraping goes through the same auth layer as everything else.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;monitoring-from-an-external-host&quot;&gt;Monitoring from an external host&lt;&#x2F;h2&gt;
&lt;p&gt;When Prometheus runs on a different machine, you configure the scrape config yourself:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.prometheus.scrapeConfigs = [
  {
    job_name = &amp;quot;zot&amp;quot;;
    honor_labels = true;
    metrics_path = &amp;quot;&amp;#x2F;metrics&amp;quot;;
    basic_auth = {
      username = &amp;quot;prometheus&amp;quot;;
      password = &amp;quot;prometheus&amp;quot;;
    };
    static_configs = [
      {
        targets = [ &amp;quot;10.255.101.200:5000&amp;quot; ];
        labels = { instance = &amp;quot;registry&amp;quot;; };
      }
    ];
  }
];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The critical detail here — point your scrape target at Zot’s port directly, not at the nginx domain. The nginx config blocks &lt;code&gt;&#x2F;metrics&lt;&#x2F;code&gt; with a 403. You want &lt;code&gt;10.255.101.200:5000&lt;&#x2F;code&gt;, not &lt;code&gt;registry.example.com:443&lt;&#x2F;code&gt;. If your monitoring traffic crosses untrusted networks, put it behind a VPN or a separate authenticated tunnel.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-grafana-dashboard&quot;&gt;The Grafana dashboard&lt;&#x2F;h2&gt;
&lt;p&gt;The bundled dashboard is more thorough than you’d expect from a “batteries included” module. It covers download and upload counts with rates, per-repository metrics, per-prefix breakdowns, HTTP latency by method with heatmaps, scheduler queue length, and storage lock latency.&lt;&#x2F;p&gt;
&lt;p&gt;You get it for free with &lt;code&gt;grafanaDashboard = true&lt;&#x2F;code&gt;. No JSON to download, no manual import. It provisions itself through Grafana’s provisioning system.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;using-the-registry&quot;&gt;Using the registry&lt;&#x2F;h2&gt;
&lt;p&gt;Once everything is deployed, the workflow is standard Docker:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;docker login registry.example.com -u ci-user
docker tag my-app:latest registry.example.com&amp;#x2F;myorg&amp;#x2F;my-app:latest
docker push registry.example.com&amp;#x2F;myorg&amp;#x2F;my-app:latest
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Pulling works the same way:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;docker pull registry.example.com&amp;#x2F;myorg&amp;#x2F;my-app:latest
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Zot’s built-in web UI is available at &lt;code&gt;https:&#x2F;&#x2F;registry.example.com&lt;&#x2F;code&gt;. It lets you browse repositories, inspect tags, and view vulnerability scan results. Nothing to install — it ships with the binary.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;full-example&quot;&gt;Full example&lt;&#x2F;h2&gt;
&lt;p&gt;Here’s a complete single-server configuration tying everything together — sops secrets, users, access control, retention, nginx with ACME, and local monitoring:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{ config, ... }:
{
  sops.secrets = {
    zot_admin_password = { };
    zot_ci_password = { };
  };

  services.zot = {
    enable = true;
    dataDir = &amp;quot;&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;zot&amp;quot;;

    # Nginx reverse proxy with Let&amp;#x27;s Encrypt
    nginx = {
      enable = true;
      domain = &amp;quot;registry.example.com&amp;quot;;
      forceSSL = true;
      acme = true;
    };

    # Declarative users — passwords from sops
    auth.users = {
      admin = {
        passwordFile = config.sops.secrets.zot_admin_password.path;
        admin = true;
      };
      ci-user = {
        passwordFile = config.sops.secrets.zot_ci_password.path;
      };
    };

    # Access control
    accessControl = {
      adminActions = [ &amp;quot;read&amp;quot; &amp;quot;create&amp;quot; &amp;quot;update&amp;quot; &amp;quot;delete&amp;quot; ];
      defaultPolicy = [ ];
      anonymousPolicy = [ ];
      repositories.&amp;quot;myorg&amp;#x2F;**&amp;quot; = {
        policies = [
          {
            users = [ &amp;quot;ci-user&amp;quot; ];
            actions = [ &amp;quot;read&amp;quot; &amp;quot;create&amp;quot; &amp;quot;update&amp;quot; &amp;quot;delete&amp;quot; ];
          }
        ];
        defaultPolicy = [ ];
        anonymousPolicy = [ &amp;quot;read&amp;quot; ];
      };
    };

    # Retention — keep things tidy
    retention = {
      dryRun = false;
      delay = &amp;quot;24h&amp;quot;;
      policies = [
        {
          repositories = [ &amp;quot;myorg&amp;#x2F;**&amp;quot; ];
          deleteReferrers = true;
          deleteUntagged = true;
          keepTags = [
            { patterns = [ &amp;quot;latest&amp;quot; ]; }
            { pushedWithin = &amp;quot;168h&amp;quot;; }
            { mostRecentlyPushedCount = 10; }
            { patterns = [ &amp;quot;v.*&amp;quot; ]; pulledWithin = &amp;quot;720h&amp;quot;; }
          ];
        }
      ];
      defaultPolicy = {
        deleteReferrers = false;
        deleteUntagged = true;
        keepTags = [
          { patterns = [ &amp;quot;.*&amp;quot; ]; }
        ];
      };
    };

    # Monitoring — local Prometheus and Grafana
    metrics.enable = true;
    enableLocalScraping = true;
    grafanaDashboard = true;
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The systemd service runs as a simple service type with &lt;code&gt;PrivateTmp&lt;&#x2F;code&gt;, &lt;code&gt;ProtectHome&lt;&#x2F;code&gt;, &lt;code&gt;NoNewPrivileges&lt;&#x2F;code&gt;, and a file descriptor limit of 500,000. The module handles all of that — you don’t need to think about it.&lt;&#x2F;p&gt;
&lt;p&gt;That’s a private OCI registry with authentication, per-repo access control, automatic image cleanup, TLS, and full observability. One file, no imperative setup, and &lt;code&gt;nixos-rebuild switch&lt;&#x2F;code&gt; gets you there.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Everything You Can Set on macOS with nix-darwin</title>
        <published>2026-03-26T12:00:00+00:00</published>
        <updated>2026-03-26T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/everything-nix-darwin-macos/"/>
        <id>https://perlpimp.net/blog/everything-nix-darwin-macos/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/everything-nix-darwin-macos/">&lt;p&gt;Most people discover Nix on macOS through packages. &lt;code&gt;nix-env -iA nixpkgs.ripgrep&lt;&#x2F;code&gt;, a few shell tools, maybe a dev environment. Useful, but it leaves the rest of your system untouched — a sprawl of &lt;code&gt;defaults write&lt;&#x2F;code&gt; commands, System Settings clicks you can’t remember, and a Dock that resets itself after every migration.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;LnL7&#x2F;nix-darwin&quot;&gt;nix-darwin&lt;&#x2F;a&gt; changes the scope. It’s a NixOS-style module system for macOS: you describe your system in Nix, run &lt;code&gt;darwin-rebuild switch&lt;&#x2F;code&gt;, and it converges. Packages, preferences, services, keyboard remapping, Homebrew casks, Dock layout — all from one configuration. Wipe the machine, rebuild, and everything is back exactly as it was.&lt;&#x2F;p&gt;
&lt;p&gt;This isn’t a setup tutorial. It’s a tour of what’s actually available — with real config snippets you can drop into your own &lt;code&gt;darwin-configuration.nix&lt;&#x2F;code&gt; and adapt.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;system-defaults-the-big-one&quot;&gt;System defaults — the big one&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;code&gt;system.defaults&lt;&#x2F;code&gt; attrset is where nix-darwin really flexes. Every &lt;code&gt;defaults write&lt;&#x2F;code&gt; you’ve ever cargo-culted from a dotfiles repo has a typed, declarative equivalent here.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;NSGlobalDomain&lt;&#x2F;strong&gt; covers the settings that apply everywhere — dark mode, key repeat, scroll direction, that infuriating beep:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.defaults.NSGlobalDomain = {
  AppleInterfaceStyle = &amp;quot;Dark&amp;quot;;
  ApplePressAndHoldEnabled = false;
  KeyRepeat = 2;
  InitialKeyRepeat = 15;
  &amp;quot;com.apple.swipescrolldirection&amp;quot; = false;
  &amp;quot;com.apple.sound.beep.volume&amp;quot; = 0.0;
  &amp;quot;com.apple.sound.beep.feedback&amp;quot; = 0;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;ApplePressAndHoldEnabled = false&lt;&#x2F;code&gt; is the one that gives you key repeat instead of the accent character picker. If you’ve ever held down &lt;code&gt;j&lt;&#x2F;code&gt; in Vim and watched nothing happen, this is why.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Dock&lt;&#x2F;strong&gt; — hide it, size it, kill recents, disable space rearranging, set hot corners:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.defaults.dock = {
  autohide = false;
  tilesize = 48;
  show-recents = false;
  mru-spaces = false;
  minimize-to-application = true;
  expose-animation-duration = 0.2;
  # Hot corners: 4=Desktop, 5=Screensaver, 12=Notification Center, 14=Quick Note
  wvous-bl-corner = 4;
  wvous-br-corner = 14;
  wvous-tl-corner = 5;
  wvous-tr-corner = 12;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;mru-spaces = false&lt;&#x2F;code&gt; stops macOS from silently reordering your spaces based on usage. If you use numbered spaces and expect Space 3 to stay in position 3, you need this.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Finder&lt;&#x2F;strong&gt; — show hidden files, default to list view, add a quit menu item:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.defaults.finder = {
  AppleShowAllExtensions = true;
  AppleShowAllFiles = true;
  FXEnableExtensionChangeWarning = false;
  FXPreferredViewStyle = &amp;quot;Nlsv&amp;quot;;     # list view
  FXRemoveOldTrashItems = true;
  QuitMenuItem = true;
  ShowPathbar = true;
  ShowStatusBar = true;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;QuitMenuItem = true&lt;&#x2F;code&gt; adds Cmd+Q to Finder. Sounds minor until you realize there’s no other way to fully quit it without &lt;code&gt;killall Finder&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Trackpad&lt;&#x2F;strong&gt; — tap to click and three-finger drag:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.defaults.trackpad = {
  Clicking = true;
  TrackpadRightClick = true;
  TrackpadThreeFingerDrag = true;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Screen capture&lt;&#x2F;strong&gt; — change the save location, format, and kill that thumbnail preview:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.defaults.screencapture = {
  location = &amp;quot;~&amp;#x2F;Downloads&amp;quot;;
  type = &amp;quot;png&amp;quot;;
  disable-shadow = true;
  show-thumbnail = false;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Menu bar clock&lt;&#x2F;strong&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.defaults.menuExtraClock = {
  Show24Hour = true;
  ShowSeconds = false;
  ShowDate = 0;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Login window&lt;&#x2F;strong&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.defaults.loginwindow = {
  GuestEnabled = false;
  DisableConsoleAccess = true;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Spaces across displays&lt;&#x2F;strong&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.defaults.spaces.spans-displays = true;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That’s a lot of &lt;code&gt;defaults write&lt;&#x2F;code&gt; commands you no longer need to remember.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;customuserpreferences-the-escape-hatch&quot;&gt;CustomUserPreferences — the escape hatch&lt;&#x2F;h2&gt;
&lt;p&gt;Not everything has a typed option in nix-darwin. &lt;code&gt;CustomUserPreferences&lt;&#x2F;code&gt; is a freeform attrset that maps directly to &lt;code&gt;defaults write&lt;&#x2F;code&gt; — any domain, any key. Think of it as the declarative version of your post-install shell script:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.defaults.CustomUserPreferences = {
  &amp;quot;com.apple.finder&amp;quot; = {
    ShowExternalHardDrivesOnDesktop = false;
    ShowRemovableMediaOnDesktop = false;
    _FXSortFoldersFirst = true;
    FXDefaultSearchScope = &amp;quot;SCcf&amp;quot;;  # search current folder
  };
  &amp;quot;com.apple.desktopservices&amp;quot; = {
    DSDontWriteNetworkStores = true;
    DSDontWriteUSBStores = true;
  };
  &amp;quot;com.apple.screensaver&amp;quot; = {
    askForPassword = 1;
    askForPasswordDelay = 0;
  };
  &amp;quot;com.apple.AdLib&amp;quot;.allowApplePersonalizedAdvertising = false;
  &amp;quot;com.apple.SoftwareUpdate&amp;quot; = {
    AutomaticCheckEnabled = true;
    ScheduleFrequency = 1;
    AutomaticDownload = 1;
    CriticalUpdateInstall = 1;
  };
  &amp;quot;com.apple.TimeMachine&amp;quot;.DoNotOfferNewDisksForBackup = true;
  &amp;quot;com.apple.WindowManager&amp;quot; = {
    HideDesktop = true;
    StandardHideDesktopIcons = true;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;DSDontWriteNetworkStores&lt;&#x2F;code&gt; and &lt;code&gt;DSDontWriteUSBStores&lt;&#x2F;code&gt; stop macOS from scattering &lt;code&gt;.DS_Store&lt;&#x2F;code&gt; files across every network share and USB drive you mount. If you’ve ever committed a &lt;code&gt;.DS_Store&lt;&#x2F;code&gt; to a repo, you know why this matters.&lt;&#x2F;p&gt;
&lt;p&gt;You can also disable keyboard shortcuts like Ctrl+Space (input source switching) through &lt;code&gt;com.apple.symbolichotkeys&lt;&#x2F;code&gt; if you need that binding for something else.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;keyboard-remapping&quot;&gt;Keyboard remapping&lt;&#x2F;h2&gt;
&lt;p&gt;Caps Lock as Control. Two lines, no Karabiner:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.keyboard = {
  enableKeyMapping = true;
  remapCapsLockToControl = true;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This uses the system-level key remapping — it works everywhere, including the login screen.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;touch-id-for-sudo&quot;&gt;Touch ID for sudo&lt;&#x2F;h2&gt;
&lt;p&gt;One line. No more editing &lt;code&gt;&#x2F;etc&#x2F;pam.d&#x2F;sudo_local&lt;&#x2F;code&gt; by hand and hoping a macOS update doesn’t overwrite it:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;security.pam.services.sudo_local.touchIdAuth = true;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Every &lt;code&gt;sudo&lt;&#x2F;code&gt; prompt becomes a fingerprint tap. This is the single most satisfying line in my entire nix-darwin config.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;declarative-homebrew&quot;&gt;Declarative Homebrew&lt;&#x2F;h2&gt;
&lt;p&gt;Nix handles most packages, but some macOS GUI apps only exist as Homebrew casks, and some things only live on the Mac App Store. nix-darwin can manage Homebrew itself — casks, taps, MAS apps, and automatic cleanup of anything not declared:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;homebrew = {
  enable = true;
  caskArgs.no_quarantine = true;
  onActivation = {
    autoUpdate = true;
    cleanup = &amp;quot;uninstall&amp;quot;;  # remove anything not declared
    upgrade = true;
  };
  casks = [ &amp;quot;firefox&amp;quot; &amp;quot;notion&amp;quot; &amp;quot;slack&amp;quot; &amp;quot;discord&amp;quot; ];
  masApps = {
    &amp;quot;WhatsApp&amp;quot; = 310633997;
    &amp;quot;Xcode&amp;quot; = 497799835;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;cleanup = &quot;uninstall&quot;&lt;&#x2F;code&gt; is the key line. Anything installed via Homebrew that isn’t in your &lt;code&gt;casks&lt;&#x2F;code&gt; or &lt;code&gt;masApps&lt;&#x2F;code&gt; list gets removed on rebuild. Your Mac only has what you’ve declared — nothing more.&lt;&#x2F;p&gt;
&lt;p&gt;Those numeric IDs come from the Mac App Store. Look them up with &lt;code&gt;mas&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;sh&quot; class=&quot;language-sh &quot;&gt;&lt;code class=&quot;language-sh&quot; data-lang=&quot;sh&quot;&gt;mas search WhatsApp
# 310633997  WhatsApp Messenger  (24.9.80)

mas search Xcode
# 497799835  Xcode               (16.3)
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The number on the left is what goes in your &lt;code&gt;masApps&lt;&#x2F;code&gt; attrset. You can also grab it from any App Store URL — it’s the number after &lt;code&gt;&#x2F;id&lt;&#x2F;code&gt; in &lt;code&gt;https:&#x2F;&#x2F;apps.apple.com&#x2F;app&#x2F;whatsapp-messenger&#x2F;id310633997&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;nix-settings-binary-caches-and-remote-builders&quot;&gt;Nix settings — binary caches and remote builders&lt;&#x2F;h2&gt;
&lt;p&gt;Flakes, substituters, distributed builds to a Linux box:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;nix.settings = {
  experimental-features = [ &amp;quot;nix-command&amp;quot; &amp;quot;flakes&amp;quot; ];
  trusted-users = [ &amp;quot;root&amp;quot; &amp;quot;@admin&amp;quot; &amp;quot;youruser&amp;quot; ];
  substituters = [ &amp;quot;https:&amp;#x2F;&amp;#x2F;cache.garnix.io&amp;quot; ];
  trusted-public-keys = [
    &amp;quot;cache.garnix.io:CTFPyKSLcx5RMJKfLo5EEPUObbA78b0YQ2DTCJXqr9g=&amp;quot;
  ];
};

nix.distributedBuilds = true;
nix.buildMachines = [{
  hostName = &amp;quot;10.0.0.50&amp;quot;;
  systems = [ &amp;quot;x86_64-linux&amp;quot; &amp;quot;aarch64-linux&amp;quot; ];
  protocol = &amp;quot;ssh-ng&amp;quot;;
  maxJobs = 64;
  supportedFeatures = [ &amp;quot;nixos-test&amp;quot; &amp;quot;benchmark&amp;quot; &amp;quot;big-parallel&amp;quot; &amp;quot;kvm&amp;quot; ];
}];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That &lt;code&gt;trusted-users&lt;&#x2F;code&gt; line is worth a closer look. Nix builds run as the &lt;code&gt;nixbld&lt;&#x2F;code&gt; daemon user, not as you — so by default the builder can’t see your SSH keys or GitHub tokens. If your flake inputs point at private repos, the build will fail with authentication errors.&lt;&#x2F;p&gt;
&lt;p&gt;Adding yourself (or &lt;code&gt;@admin&lt;&#x2F;code&gt;) to &lt;code&gt;trusted-users&lt;&#x2F;code&gt; lets the daemon inherit your user-level access tokens. Pair it with &lt;code&gt;nix.extraOptions&lt;&#x2F;code&gt; to tell the daemon where to find your Git credentials:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;nix.extraOptions = &amp;#x27;&amp;#x27;
  !include &amp;#x2F;etc&amp;#x2F;nix&amp;#x2F;access-tokens.conf
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then drop a file at &lt;code&gt;&#x2F;etc&#x2F;nix&#x2F;access-tokens.conf&lt;&#x2F;code&gt; — outside the store, so it stays out of world-readable &lt;code&gt;&#x2F;nix&#x2F;store&lt;&#x2F;code&gt; — with:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;access-tokens = github.com=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now &lt;code&gt;nix build&lt;&#x2F;code&gt; and &lt;code&gt;darwin-rebuild switch&lt;&#x2F;code&gt; can pull private flake inputs without manual &lt;code&gt;git clone&lt;&#x2F;code&gt;. The token never lands in the Nix store and you can rotate it in one place.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;services&quot;&gt;Services&lt;&#x2F;h2&gt;
&lt;p&gt;nix-darwin can manage launchd services the same way NixOS manages systemd units. OpenSSH is the most common:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.openssh = {
  enable = true;
  extraConfig = &amp;#x27;&amp;#x27;
    PasswordAuthentication no
    KbdInteractiveAuthentication no
    PermitRootLogin no
  &amp;#x27;&amp;#x27;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;But the service catalog goes well beyond SSH — yabai, skhd, sketchybar, lorri, and the nix-daemon itself are all managed this way.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;activation-scripts&quot;&gt;Activation scripts&lt;&#x2F;h2&gt;
&lt;p&gt;Need to run arbitrary shell commands on every &lt;code&gt;darwin-rebuild switch&lt;&#x2F;code&gt;? Activation scripts:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.activationScripts.postActivation.text = &amp;#x27;&amp;#x27;
  chsh -s &amp;#x2F;run&amp;#x2F;current-system&amp;#x2F;sw&amp;#x2F;bin&amp;#x2F;fish youruser
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is the escape hatch for anything nix-darwin doesn’t have a module for. It runs as root during activation, so you can set ownership, create directories, modify system files — whatever the rebuild needs to converge.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;dock-entries&quot;&gt;Dock entries&lt;&#x2F;h2&gt;
&lt;p&gt;nix-darwin manages Dock contents natively through &lt;code&gt;system.defaults.dock.persistent-apps&lt;&#x2F;code&gt; — no third-party tools like &lt;code&gt;dockutil&lt;&#x2F;code&gt; needed. Just list your apps:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.defaults.dock.persistent-apps = [
  &amp;quot;&amp;#x2F;Applications&amp;#x2F;Safari.app&amp;quot;
  &amp;quot;&amp;#x2F;Applications&amp;#x2F;Slack.app&amp;quot;
  &amp;quot;&amp;#x2F;System&amp;#x2F;Applications&amp;#x2F;Music.app&amp;quot;
];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Strings auto-coerce to app entries. For folders or spacers, use the tagged form: &lt;code&gt;{ folder = &quot;&#x2F;path&quot;; }&lt;&#x2F;code&gt; or &lt;code&gt;{ spacer = { small = true; }; }&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;No more dragging icons around after a fresh install. The Dock shows exactly what you declared — nothing more, nothing less.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;everything-else&quot;&gt;Everything else&lt;&#x2F;h2&gt;
&lt;p&gt;There’s more than fits in one post. A quick survey of other nix-darwin options worth exploring:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;fonts.packages&lt;&#x2F;code&gt; — declarative font installation, no more dragging &lt;code&gt;.ttf&lt;&#x2F;code&gt; files into Font Book&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;services.yabai&lt;&#x2F;code&gt; &#x2F; &lt;code&gt;services.skhd&lt;&#x2F;code&gt; — tiling window management&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;services.sketchybar&lt;&#x2F;code&gt; — custom menu bar replacement&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;services.jankyborders&lt;&#x2F;code&gt; — window borders for tiling setups&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;launchd.user.agents&lt;&#x2F;code&gt; &#x2F; &lt;code&gt;launchd.daemons&lt;&#x2F;code&gt; — custom launchd services for anything&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;programs.fish&lt;&#x2F;code&gt; &#x2F; &lt;code&gt;programs.zsh&lt;&#x2F;code&gt; — shell configuration&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;power&lt;&#x2F;code&gt; — sleep and wake settings&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;system.defaults.universalaccess&lt;&#x2F;code&gt; — accessibility settings&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;system.defaults.WindowManager&lt;&#x2F;code&gt; — Stage Manager configuration&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;system.defaults.controlcenter&lt;&#x2F;code&gt; — control center visibility&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;networking.dns&lt;&#x2F;code&gt; &#x2F; &lt;code&gt;networking.search&lt;&#x2F;code&gt; — DNS configuration&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;system.defaults.smb&lt;&#x2F;code&gt; — SMB&#x2F;network settings&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;the-full-picture&quot;&gt;The full picture&lt;&#x2F;h2&gt;
&lt;p&gt;The point isn’t any single option. It’s that your Mac — the preferences, the services, the packages, the Dock, the keyboard, the shell — is described in one place. You can diff it, review it, roll it back, and hand it to a new machine.&lt;&#x2F;p&gt;
&lt;p&gt;The best way to discover what’s available is the &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;searchix.alanpearce.eu&#x2F;options&#x2F;darwin&#x2F;search&quot;&gt;nix-darwin option search&lt;&#x2F;a&gt;. Type a keyword, see what’s declarative. You’ll be surprised how much of System Settings has a Nix equivalent.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;darwin-rebuild switch&lt;&#x2F;code&gt; and your Mac is exactly the machine you described.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>nix run and nix develop — Try Anything Without Installing It</title>
        <published>2026-03-25T12:00:00+00:00</published>
        <updated>2026-03-25T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/nix-run-develop-try-anything/"/>
        <id>https://perlpimp.net/blog/nix-run-develop-try-anything/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/nix-run-develop-try-anything/">&lt;p&gt;Every package manager works the same way. You hear about a tool, you install it, you try it, and then you either keep it or forget to uninstall it. Six months later you’re staring at &lt;code&gt;brew list&lt;&#x2F;code&gt; wondering what half of these things are and whether removing them will break something.&lt;&#x2F;p&gt;
&lt;p&gt;Nix inverts this. The default is impermanence. You run a program, it runs, and then it’s gone from your PATH. No install step, no uninstall step, no residue. If you want to keep it, that’s a separate, deliberate decision.&lt;&#x2F;p&gt;
&lt;p&gt;This is one of those features that sounds like a minor convenience until you use it for a week and can’t go back.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;tier-1-one-shot-commands-from-nixpkgs&quot;&gt;Tier 1: one-shot commands from nixpkgs&lt;&#x2F;h2&gt;
&lt;p&gt;The simplest form. You want to run a program — once, right now, without thinking about it:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Try btop without installing it
nix run nixpkgs#btop

# Try the Helix editor
nix run nixpkgs#helix

# Need jq for one pipeline?
nix run nixpkgs#jq -- &amp;#x27;.[] | .name&amp;#x27; &amp;lt; data.json

# Check disk usage with dust instead of du
nix run nixpkgs#dust

# Try ripgrep before deciding if it replaces grep
nix run nixpkgs#ripgrep -- &amp;quot;pattern&amp;quot; .&amp;#x2F;src

# Quick look at a CSV file
nix run nixpkgs#visidata -- data.csv

# Need to convert an image, once
nix run nixpkgs#imagemagick -- convert input.png -resize 50% output.png

# Quick HTTP server for the current directory
nix run nixpkgs#python3 -- -m http.server 8080
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;--&lt;&#x2F;code&gt; separates Nix’s flags from the program’s flags. Everything after &lt;code&gt;--&lt;&#x2F;code&gt; gets passed through to the program itself.&lt;&#x2F;p&gt;
&lt;p&gt;First run fetches and caches the package. Second run is instant — the binary is already in the Nix store. But it never appears in &lt;code&gt;which&lt;&#x2F;code&gt;, never shows up in your package list, never conflicts with anything. It’s there when you ask for it and invisible when you don’t.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;nixpkgs#&lt;&#x2F;code&gt; prefix means “from the nixpkgs flake.” That’s the same package set behind &lt;code&gt;apt&lt;&#x2F;code&gt;, &lt;code&gt;brew&lt;&#x2F;code&gt;, and every NixOS system — about 100,000 packages. If it’s packaged for Linux, it’s probably in nixpkgs.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;tier-2-running-directly-from-github&quot;&gt;Tier 2: running directly from GitHub&lt;&#x2F;h2&gt;
&lt;p&gt;Any project that ships a &lt;code&gt;flake.nix&lt;&#x2F;code&gt; with a &lt;code&gt;packages&lt;&#x2F;code&gt; or &lt;code&gt;apps&lt;&#x2F;code&gt; output can be run straight from its repository. No clone, no build setup, no README scavenger hunt:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Ghostty — GPU-accelerated terminal emulator
nix run github:ghostty-org&amp;#x2F;ghostty

# OpenCode — terminal-based AI coding assistant
nix run github:anomalyco&amp;#x2F;opencode

# NixVim — full Neovim distribution configured with Nix
nix run github:nix-community&amp;#x2F;nixvim
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;You can pin to a branch, tag, or commit:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Specific tag
nix run github:ghostty-org&amp;#x2F;ghostty&amp;#x2F;v1.0.0

# Specific branch
nix run github:ghostty-org&amp;#x2F;ghostty&amp;#x2F;main

# Specific app from a multi-output flake
nix run github:org&amp;#x2F;repo#specific-app
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The URL anatomy, for reference:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;nix run github:&amp;lt;owner&amp;gt;&amp;#x2F;&amp;lt;repo&amp;gt;&amp;#x2F;&amp;lt;ref&amp;gt;#&amp;lt;app&amp;gt;
         │       │      │     │      │
         │       │      │     │      └── flake app output (optional, defaults to &amp;quot;default&amp;quot;)
         │       │      │     └── branch, tag, or commit (optional)
         │       │      └── repository name
         │       └── GitHub user or org
         └── flake URL scheme
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is genuinely powerful. Someone posts a link to a project, you &lt;code&gt;nix run&lt;&#x2F;code&gt; it, and thirty seconds later you’re using it. No installation instructions, no dependency conflicts, no “works on my machine.” The flake defines exactly what gets built and how.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;tier-3-full-dev-environments-with-nix-develop&quot;&gt;Tier 3: full dev environments with nix develop&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;code&gt;nix run&lt;&#x2F;code&gt; executes a single program. &lt;code&gt;nix develop&lt;&#x2F;code&gt; drops you into the project’s entire development environment — compilers, libraries, tools, environment variables, the lot.&lt;&#x2F;p&gt;
&lt;p&gt;You can do this without even cloning the repo:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Enter the dev shell of a GitHub project directly
nix develop github:ghostty-org&amp;#x2F;ghostty

# Specific dev shell from a multi-shell flake
nix develop github:org&amp;#x2F;repo#shell-name
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Or from a local checkout, which is the more common case:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;git clone https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;user&amp;#x2F;my-project &amp;amp;&amp;amp; cd my-project
nix develop
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Inside the shell, everything the project’s &lt;code&gt;flake.nix&lt;&#x2F;code&gt; declares is available. The right compiler version, the right formatter, the right linter, any helper scripts — all in PATH. Leave the shell and they vanish. No global state touched.&lt;&#x2F;p&gt;
&lt;p&gt;This is what makes onboarding to a Nix-based project feel like cheating. No “install these seven things first” document. No version mismatches. &lt;code&gt;nix develop&lt;&#x2F;code&gt;, and you’re ready. If it built on CI, it builds on your machine, because it’s the same environment.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;direnv-the-invisible-version&quot;&gt;direnv: the invisible version&lt;&#x2F;h2&gt;
&lt;p&gt;Typing &lt;code&gt;nix develop&lt;&#x2F;code&gt; every time you &lt;code&gt;cd&lt;&#x2F;code&gt; into a project gets old. direnv automates it. Add a &lt;code&gt;.envrc&lt;&#x2F;code&gt; to the project root:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;use flake
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now the dev shell activates when you enter the directory and deactivates when you leave. No manual shell entry, no remembering which project needs what. Your terminal just has the right tools available, always.&lt;&#x2F;p&gt;
&lt;p&gt;This combines especially well with &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;nix-community&#x2F;nix-direnv&quot;&gt;nix-direnv&lt;&#x2F;a&gt;, which caches the dev shell so re-entering the directory doesn’t trigger a rebuild. First entry takes a few seconds. Every entry after that is instant.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;nix-shell-vs-nix-run&quot;&gt;nix shell vs nix run&lt;&#x2F;h2&gt;
&lt;p&gt;These two look similar but serve different purposes:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# nix run: executes the program&amp;#x27;s default command, then you&amp;#x27;re done
nix run nixpkgs#htop

# nix shell: adds packages to PATH, then drops you into a shell
nix shell nixpkgs#imagemagick
convert input.png output.jpg   # &amp;#x27;convert&amp;#x27; is now in PATH
mogrify -resize 50% *.png      # so is every other imagemagick command
exit                           # leave the shell, they&amp;#x27;re gone
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;nix run&lt;&#x2F;code&gt; is for “I want to use this program right now.” &lt;code&gt;nix shell&lt;&#x2F;code&gt; is for “I need these tools available for a while.” You can also combine packages in a single shell:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix shell nixpkgs#ffmpeg nixpkgs#imagemagick nixpkgs#jq
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Three packages, one ephemeral environment. Useful when you’re doing some ad-hoc media processing and need a few tools together without setting up a flake.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;comma-the-laziest-possible-workflow&quot;&gt;Comma: the laziest possible workflow&lt;&#x2F;h2&gt;
&lt;p&gt;If even &lt;code&gt;nix run nixpkgs#&lt;&#x2F;code&gt; is too much typing, &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;nix-community&#x2F;comma&quot;&gt;comma&lt;&#x2F;a&gt; exists:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Install comma once
nix profile install nixpkgs#comma

# Then just prefix any command with a comma
, btop
, dust .&amp;#x2F;
, cowsay &amp;quot;hello from nix&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Comma looks up the command in a nix-index database, finds the package that provides it, and runs it. No &lt;code&gt;nixpkgs#&lt;&#x2F;code&gt; prefix, no knowing the package name — just the command you want.&lt;&#x2F;p&gt;
&lt;p&gt;The trade-off is that you need to build the nix-index database first (&lt;code&gt;nix run nixpkgs#nix-index&lt;&#x2F;code&gt;), and it can return the wrong package if multiple packages provide the same command name. But for casual use — “I want to try that tool I saw on Hacker News” — it’s hard to beat.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-try-before-you-install-workflow&quot;&gt;The try-before-you-install workflow&lt;&#x2F;h2&gt;
&lt;p&gt;Put it all together and you get a fundamentally different relationship with your tools:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Discover&lt;&#x2F;strong&gt; — hear about a tool, &lt;code&gt;nix run nixpkgs#tool&lt;&#x2F;code&gt; to try it&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Evaluate&lt;&#x2F;strong&gt; — use it for a few days via &lt;code&gt;nix shell&lt;&#x2F;code&gt; or comma&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Commit&lt;&#x2F;strong&gt; — if it’s worth keeping, add it to your NixOS configuration or home-manager&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Or don’t&lt;&#x2F;strong&gt; — do nothing, the cached build gets garbage-collected eventually&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The default is impermanence. You only permanently install things you’ve already decided are worth it. There’s no “I installed this six months ago and forgot” because you never installed it in the first place.&lt;&#x2F;p&gt;
&lt;p&gt;This also changes how you think about tool recommendations. Someone says “try hyperfine for benchmarking”? That’s a ten-second experiment, not a commitment. &lt;code&gt;nix run nixpkgs#hyperfine -- --warmup 3 &#x27;my-command&#x27;&lt;&#x2F;code&gt; — either it’s useful or it isn’t, and either way your system is unchanged.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-it-won-t-work&quot;&gt;When it won’t work&lt;&#x2F;h2&gt;
&lt;p&gt;Not everything is &lt;code&gt;nix run&lt;&#x2F;code&gt;-able:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Libraries, fonts, and packages without executables&lt;&#x2F;strong&gt; — there’s nothing to run. Use &lt;code&gt;nix shell&lt;&#x2F;code&gt; or &lt;code&gt;nix build&lt;&#x2F;code&gt; instead.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Programs that need system integration&lt;&#x2F;strong&gt; — a window manager, a display server, things that need to register with D-Bus or set up system services. These need proper installation.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Projects without a flake.nix&lt;&#x2F;strong&gt; — the &lt;code&gt;github:&lt;&#x2F;code&gt; URL scheme requires a flake. An increasing number of projects have one, but plenty don’t. For non-flake Nix projects, you can sometimes use &lt;code&gt;--impure&lt;&#x2F;code&gt; or &lt;code&gt;nix-shell -p&lt;&#x2F;code&gt;, but that’s a different workflow.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Large builds from source&lt;&#x2F;strong&gt; — &lt;code&gt;nix run github:some&#x2F;project&lt;&#x2F;code&gt; might need to compile the entire thing if there’s no binary cache. That’s not a quick experiment, that’s a coffee break. Projects that publish to &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.cachix.org&#x2F;&quot;&gt;cachix&lt;&#x2F;a&gt; or the NixOS binary cache avoid this.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;These are real limitations, but they’re the edges. For the vast majority of CLI tools, utilities, and development environments, the workflow just works. And once you’re used to it, going back to &lt;code&gt;brew install&lt;&#x2F;code&gt; &#x2F; &lt;code&gt;brew uninstall&lt;&#x2F;code&gt; &#x2F; &lt;code&gt;brew cleanup&lt;&#x2F;code&gt; feels like paperwork.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Syncing qBittorrent Ports with ProtonVPN NAT-PMP on NixOS</title>
        <published>2026-03-24T12:00:00+00:00</published>
        <updated>2026-03-24T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/qbittorrent-protonvpn-natpmp-nixos/"/>
        <id>https://perlpimp.net/blog/qbittorrent-protonvpn-natpmp-nixos/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/qbittorrent-protonvpn-natpmp-nixos/">&lt;p&gt;Everything works until the port changes. You’re running qBittorrent behind ProtonVPN, port forwarding is enabled, peers are connecting — and then the VPN gateway silently reassigns your external port. qBittorrent doesn’t know. Peers try the old port, get nothing, and your swarm participation drops to zero. You might not notice for hours.&lt;&#x2F;p&gt;
&lt;p&gt;ProtonVPN handles port forwarding through NAT-PMP (RFC 6886). The gateway assigns an external port dynamically — it changes on reconnect and can change mid-session. There’s no static port option. If you want incoming connections, something needs to continuously track the assigned port and tell qBittorrent about it.&lt;&#x2F;p&gt;
&lt;p&gt;Manual configuration doesn’t survive a single VPN reconnect. You need a daemon.&lt;&#x2F;p&gt;
&lt;p&gt;I wrote one — &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;proton-port-sync&quot;&gt;proton-port-sync&lt;&#x2F;a&gt;. If you just want to use it, the README has everything you need. This post is about how it works and why it’s built the way it is.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-not-the-natpmp-crate&quot;&gt;Why not the natpmp crate&lt;&#x2F;h2&gt;
&lt;p&gt;The Rust &lt;code&gt;natpmp&lt;&#x2F;code&gt; crate exists and implements the protocol. It has one critical problem with policy-based routing.&lt;&#x2F;p&gt;
&lt;p&gt;When you run WireGuard with policy routing — say, routing table 51820 matching traffic from &lt;code&gt;10.2.0.2&lt;&#x2F;code&gt; — the &lt;code&gt;natpmp&lt;&#x2F;code&gt; crate binds its UDP socket to &lt;code&gt;0.0.0.0:0&lt;&#x2F;code&gt;. The NAT-PMP request goes out over the VPN tunnel correctly, but the response from the gateway doesn’t route back. The kernel sees a packet for an unbound socket with no interface affinity and has no reason to route it through the VPN’s routing table. The request times out and you’re debugging network issues that don’t exist.&lt;&#x2F;p&gt;
&lt;p&gt;The fix: bind the UDP socket directly to the WireGuard interface IP. That’s the whole trick — the socket is now associated with the VPN interface, responses traverse the correct routing table, and everything works. It’s obvious in hindsight and confusing for about fifteen minutes.&lt;&#x2F;p&gt;
&lt;p&gt;There’s a second issue. RFC 6886 says internal port 0 means “delete all mappings.” ProtonVPN expects internal port 1 — the actual value is irrelevant since ProtonVPN assigns the port server-side, but it must be non-zero. The &lt;code&gt;natpmp&lt;&#x2F;code&gt; crate sends 0 by default.&lt;&#x2F;p&gt;
&lt;p&gt;Two bugs, both trivial individually, both invisible until you’ve wasted an evening on packet captures.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-protocol&quot;&gt;The protocol&lt;&#x2F;h2&gt;
&lt;p&gt;NAT-PMP is refreshingly simple. Twelve bytes out, sixteen bytes back, over UDP to port 5351:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;rust&quot; class=&quot;language-rust &quot;&gt;&lt;code class=&quot;language-rust&quot; data-lang=&quot;rust&quot;&gt;fn request_protocol_mapping(&amp;amp;self, opcode: u8, lifetime_secs: u32) -&amp;gt; Result&amp;lt;u16&amp;gt; {
    let socket = UdpSocket::bind(SocketAddr::new(self.bind_address, 0))?;
    socket.connect(SocketAddr::new(self.gateway, NATPMP_PORT))?;

    let mut request = [0u8; 12];
    request[1] = opcode;
    request[4..6].copy_from_slice(&amp;amp;1u16.to_be_bytes());     &amp;#x2F;&amp;#x2F; internal port = 1
    request[8..12].copy_from_slice(&amp;amp;lifetime_secs.to_be_bytes());

    &amp;#x2F;&amp;#x2F; Exponential backoff per RFC 6886: 250ms initial, doubling, up to 9 attempts
    let mut timeout = Duration::from_millis(250);
    for attempt in 0..9 {
        socket.set_read_timeout(Some(timeout))?;
        socket.send(&amp;amp;request)?;
        let mut buf = [0u8; 16];
        match socket.recv(&amp;amp;mut buf) {
            Ok(16) =&amp;gt; {
                let result_code = u16::from_be_bytes([buf[2], buf[3]]);
                if result_code != 0 {
                    anyhow::bail!(&amp;quot;NAT-PMP error: result code {result_code}&amp;quot;);
                }
                let external_port = u16::from_be_bytes([buf[10], buf[11]]);
                return Ok(external_port);
            }
            Ok(n) =&amp;gt; anyhow::bail!(&amp;quot;Unexpected response size: {n}&amp;quot;),
            Err(_) if attempt &amp;lt; 8 =&amp;gt; {
                timeout *= 2;
                continue;
            }
            Err(e) =&amp;gt; return Err(e.into()),
        }
    }
    anyhow::bail!(&amp;quot;NAT-PMP request timed out after 9 attempts&amp;quot;)
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The request format is version (1 byte), opcode (1 byte), reserved (2 bytes), internal port (2 bytes), external port (2 bytes), and lifetime (4 bytes). The response mirrors it with a result code and the actual assigned external port. The RFC specifies exponential backoff starting at 250ms, doubling each attempt — nine attempts covers about two minutes of retries before giving up.&lt;&#x2F;p&gt;
&lt;p&gt;The daemon requests both UDP and TCP mappings. If they differ — which shouldn’t happen with ProtonVPN but can with other NAT-PMP gateways — it logs the discrepancy and uses the TCP port, since that’s what qBittorrent primarily uses for peer connections.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-main-loop&quot;&gt;The main loop&lt;&#x2F;h2&gt;
&lt;p&gt;The core logic is a loop that renews the NAT-PMP mapping, detects port changes, and pushes updates to qBittorrent:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;rust&quot; class=&quot;language-rust &quot;&gt;&lt;code class=&quot;language-rust&quot; data-lang=&quot;rust&quot;&gt;loop {
    match natpmp_client.request_mapping(60) {
        Ok(port) =&amp;gt; {
            fail_count = 0;
            if current_port != Some(port) {
                info!(%port, &amp;quot;Port changed, updating qBittorrent&amp;quot;);
                qbt.set_listen_port(port).await?;
                current_port = Some(port);
                &amp;#x2F;&amp;#x2F; Update Prometheus metrics
            }
        }
        Err(e) =&amp;gt; {
            warn!(?e, &amp;quot;NAT-PMP renewal failed&amp;quot;);
            fail_count += 1;
            if fail_count &amp;gt;= max_failures {
                warn!(&amp;quot;Too many failures, restarting WireGuard&amp;quot;);
                Command::new(&amp;quot;systemctl&amp;quot;)
                    .args([&amp;quot;restart&amp;quot;, &amp;amp;wg_unit])
                    .status()?;
                fail_count = 0;
                current_port = None;
                sleep(Duration::from_secs(10)).await;
                continue;
            }
            sleep(Duration::from_secs(15)).await;
            continue;
        }
    }
    sleep(Duration::from_secs(renew_interval)).await;
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Three design choices worth noting:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;45-second renewal interval.&lt;&#x2F;strong&gt; NAT-PMP mappings are requested with a 60-second lifetime and renewed at 45 seconds. That’s a 15-second buffer — enough to absorb a slow response without the mapping expiring.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;3-failure threshold.&lt;&#x2F;strong&gt; After three consecutive NAT-PMP failures, the daemon restarts the WireGuard unit. This sounds aggressive, but in practice NAT-PMP failures almost always mean the tunnel is in a bad state. A stale gateway, a half-torn-down connection, a routing table that’s out of sync — restarting WireGuard clears all of it. Three failures at 15 seconds each means you wait 45 seconds before pulling the trigger.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;10-second post-restart cooldown.&lt;&#x2F;strong&gt; After restarting WireGuard, the daemon sleeps for 10 seconds before retrying. The tunnel needs time to re-establish — handshake, key exchange, routing table update. Hammering NAT-PMP requests during that window just adds noise to the logs.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;talking-to-qbittorrent&quot;&gt;Talking to qBittorrent&lt;&#x2F;h2&gt;
&lt;p&gt;qBittorrent exposes a WebUI API. The daemon uses two endpoints — login and set preferences:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;rust&quot; class=&quot;language-rust &quot;&gt;&lt;code class=&quot;language-rust&quot; data-lang=&quot;rust&quot;&gt;pub struct QbtClient {
    client: reqwest::Client,  &amp;#x2F;&amp;#x2F; with cookie store
    base_url: String,
    username: String,
    password: String,
}

impl QbtClient {
    pub async fn set_listen_port(&amp;amp;self, port: u16) -&amp;gt; Result&amp;lt;()&amp;gt; {
        self.login().await?;
        self.client
            .post(format!(&amp;quot;{}&amp;#x2F;api&amp;#x2F;v2&amp;#x2F;app&amp;#x2F;setPreferences&amp;quot;, self.base_url))
            .form(&amp;amp;[(&amp;quot;json&amp;quot;, format!(r#&amp;quot;{{&amp;quot;listen_port&amp;quot;:{port}}}&amp;quot;#))])
            .send().await?
            .error_for_status()?;
        Ok(())
    }

    async fn login(&amp;amp;self) -&amp;gt; Result&amp;lt;()&amp;gt; {
        self.client
            .post(format!(&amp;quot;{}&amp;#x2F;api&amp;#x2F;v2&amp;#x2F;auth&amp;#x2F;login&amp;quot;, self.base_url))
            .form(&amp;amp;[(&amp;quot;username&amp;quot;, &amp;amp;self.username), (&amp;quot;password&amp;quot;, &amp;amp;self.password)])
            .send().await?
            .error_for_status()?;
        Ok(())
    }
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The reqwest client is configured with a cookie store, so the session cookie from &lt;code&gt;login()&lt;&#x2F;code&gt; persists across requests. The daemon re-authenticates on every port change — qBittorrent sessions can expire, and re-logging in is cheaper than tracking session state.&lt;&#x2F;p&gt;
&lt;p&gt;The password comes from a file (&lt;code&gt;--qbt-password-file&lt;&#x2F;code&gt;), loaded once at startup. No passwords in CLI args, no passwords in environment variables. The file path is the only thing that appears in process listings or systemd unit files.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;prometheus-metrics&quot;&gt;Prometheus metrics&lt;&#x2F;h2&gt;
&lt;p&gt;Six metrics on an optional &lt;code&gt;&#x2F;metrics&lt;&#x2F;code&gt; endpoint:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;proton_port_sync_current_port          — currently mapped port (gauge)
proton_port_sync_port_changes_total    — total port changes (counter)
proton_port_sync_last_change_timestamp — unix timestamp of last change (gauge)
proton_port_sync_renewals_total        — successful NAT-PMP renewals (counter)
proton_port_sync_failures_total        — NAT-PMP request failures (counter)
proton_port_sync_wg_restarts_total     — WireGuard restarts triggered (counter)
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The useful alerts: &lt;code&gt;port_changes_total&lt;&#x2F;code&gt; increasing rapidly means the gateway is reassigning ports faster than expected — possibly a VPN issue. &lt;code&gt;failures_total&lt;&#x2F;code&gt; spiking means the tunnel is unstable. &lt;code&gt;wg_restarts_total&lt;&#x2F;code&gt; going up means the daemon is repeatedly cycling WireGuard, which warrants investigation.&lt;&#x2F;p&gt;
&lt;p&gt;The metrics endpoint is served via axum on a configurable address and port. It’s optional — skip &lt;code&gt;--metrics-addr&lt;&#x2F;code&gt; and the daemon runs without it. But if you’re already running Prometheus, the five minutes to wire it up will save you the next time something goes silently wrong at 3 AM.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-nixos-module&quot;&gt;The NixOS module&lt;&#x2F;h2&gt;
&lt;p&gt;The module wraps all of this in a declarative systemd service:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{ config, lib, pkgs, ... }:
let
  cfg = config.services.proton-port-sync;
in {
  options.services.proton-port-sync = {
    enable = lib.mkEnableOption &amp;quot;proton-port-sync&amp;quot;;
    gateway = lib.mkOption { type = lib.types.str; default = &amp;quot;10.2.0.1&amp;quot;; };
    bindAddress = lib.mkOption { type = lib.types.str; default = &amp;quot;10.2.0.2&amp;quot;; };
    qbtUrl = lib.mkOption { type = lib.types.str; default = &amp;quot;http:&amp;#x2F;&amp;#x2F;127.0.0.1:8080&amp;quot;; };
    qbtUser = lib.mkOption { type = lib.types.str; default = &amp;quot;admin&amp;quot;; };
    qbtPasswordFile = lib.mkOption { type = lib.types.path; };
    renewInterval = lib.mkOption { type = lib.types.int; default = 45; };
    maxFailures = lib.mkOption { type = lib.types.int; default = 3; };
    wgUnit = lib.mkOption { type = lib.types.str; default = &amp;quot;wireguard-wg0.service&amp;quot;; };
    metrics = {
      enable = lib.mkEnableOption &amp;quot;Prometheus metrics&amp;quot;;
      address = lib.mkOption { type = lib.types.str; default = &amp;quot;127.0.0.1&amp;quot;; };
      port = lib.mkOption { type = lib.types.port; default = 9834; };
    };
  };

  config = lib.mkIf cfg.enable {
    systemd.services.proton-port-sync = {
      description = &amp;quot;Proton VPN NAT-PMP port sync for qBittorrent&amp;quot;;
      after = [ &amp;quot;network-online.target&amp;quot; cfg.wgUnit ];
      bindsTo = [ cfg.wgUnit ];
      wants = [ &amp;quot;qbittorrent.service&amp;quot; ];
      wantedBy = [ &amp;quot;multi-user.target&amp;quot; ];

      serviceConfig = {
        ExecStart = &amp;quot;${pkgs.proton-port-sync}&amp;#x2F;bin&amp;#x2F;proton-port-sync&amp;quot;
          + &amp;quot; --gateway ${cfg.gateway}&amp;quot;
          + &amp;quot; --bind-address ${cfg.bindAddress}&amp;quot;
          + &amp;quot; --qbt-url ${cfg.qbtUrl}&amp;quot;
          + &amp;quot; --qbt-user ${cfg.qbtUser}&amp;quot;
          + &amp;quot; --qbt-password-file \${CREDENTIALS_DIRECTORY}&amp;#x2F;qbt-password&amp;quot;
          + lib.optionalString cfg.metrics.enable
            &amp;quot; --metrics-addr ${cfg.metrics.address}:${toString cfg.metrics.port}&amp;quot;;

        LoadCredential = &amp;quot;qbt-password:${cfg.qbtPasswordFile}&amp;quot;;
        Restart = &amp;quot;on-failure&amp;quot;;
        RestartSec = &amp;quot;5s&amp;quot;;

        # Security hardening
        ProtectSystem = &amp;quot;strict&amp;quot;;
        ProtectHome = true;
        NoNewPrivileges = true;
        PrivateTmp = true;
      };
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The systemd wiring matters more than it looks:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;bindsTo&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; ties the service lifecycle to WireGuard. If WireGuard stops — whether from a manual stop, a crash, or a restart — this service stops too. No orphaned daemon sending NAT-PMP requests into the void.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;wants&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; qBittorrent as a soft dependency. The daemon starts regardless of whether qBittorrent is running — it’ll just fail to update the port until qBittorrent comes up. A hard dependency (&lt;code&gt;requires&lt;&#x2F;code&gt;) would be wrong here because the daemon should survive a qBittorrent restart without being killed.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;LoadCredential&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; handles the password file. systemd copies the file into a private credentials directory, accessible only to the service. The original file path never appears in the process’s environment or arguments — only the path under &lt;code&gt;$CREDENTIALS_DIRECTORY&lt;&#x2F;code&gt;. This is systemd’s credential passing mechanism, and it’s strictly better than &lt;code&gt;EnvironmentFile&lt;&#x2F;code&gt; or passing secrets via CLI args.&lt;&#x2F;p&gt;
&lt;p&gt;The hardening options — &lt;code&gt;ProtectSystem=strict&lt;&#x2F;code&gt;, &lt;code&gt;ProtectHome=true&lt;&#x2F;code&gt;, &lt;code&gt;NoNewPrivileges=true&lt;&#x2F;code&gt; — are standard for a daemon that only needs network access and one credential file. The service can’t write to the filesystem, can’t read home directories, and can’t escalate privileges. If the binary is compromised, the blast radius is minimal.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;wiring-it-up&quot;&gt;Wiring it up&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;flake-input&quot;&gt;Flake input&lt;&#x2F;h3&gt;
&lt;p&gt;Add the flake to your inputs with &lt;code&gt;nixpkgs.follows&lt;&#x2F;code&gt; to avoid pulling a second copy of nixpkgs:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;inputs = {
  proton-port-sync.url = &amp;quot;github:ijohanne&amp;#x2F;proton-port-sync&amp;quot;;
  proton-port-sync.inputs.nixpkgs.follows = &amp;quot;nixpkgs&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;host-configuration&quot;&gt;Host configuration&lt;&#x2F;h3&gt;
&lt;p&gt;Import the module and configure it:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;imports = [ proton-port-sync.nixosModules.default ];

services.proton-port-sync = {
  enable = true;
  gateway = &amp;quot;10.2.0.1&amp;quot;;
  qbtUser = &amp;quot;admin&amp;quot;;
  qbtPasswordFile = config.sops.secrets.&amp;quot;qbittorrent&amp;#x2F;webui_password&amp;quot;.path;
  metrics = {
    enable = true;
    address = &amp;quot;10.100.0.10&amp;quot;;  # private backhaul IP
    port = 9834;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;qbtPasswordFile&lt;&#x2F;code&gt; points to a sops-nix secret. On activation, sops-nix decrypts it to &lt;code&gt;&#x2F;run&#x2F;secrets&#x2F;qbittorrent&#x2F;webui_password&lt;&#x2F;code&gt;, and systemd’s &lt;code&gt;LoadCredential&lt;&#x2F;code&gt; picks it up from there. The password never appears in your Nix configuration, never lands in the Nix store, and only exists decrypted in a tmpfs mount.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;prometheus-scraping&quot;&gt;Prometheus scraping&lt;&#x2F;h3&gt;
&lt;p&gt;On your monitoring host, add a scrape target:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.prometheus.scrapeConfigs = [
  {
    job_name = &amp;quot;proton-port-sync&amp;quot;;
    honor_labels = true;
    static_configs = [
      {
        targets = [ &amp;quot;10.100.0.10:9834&amp;quot; ];
        labels = { instance = &amp;quot;myhost&amp;quot;; };
      }
    ];
  }
];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That’s the full stack — NAT-PMP renewals, automatic qBittorrent updates, WireGuard failure recovery, Prometheus metrics, and sops-nix secrets management, all declared in a few dozen lines of Nix. The daemon itself is about 300 lines of Rust. The interesting part wasn’t the protocol or the API integration — it was the one-line socket bind fix that makes it actually work behind policy routing.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Using Private GitHub Repositories with Nix Flakes</title>
        <published>2026-03-23T12:00:00+00:00</published>
        <updated>2026-03-23T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/private-github-repos-nix-flakes/"/>
        <id>https://perlpimp.net/blog/private-github-repos-nix-flakes/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/private-github-repos-nix-flakes/">&lt;p&gt;Private GitHub repos and Nix flakes have a failure mode that wastes exactly the right amount of time to be infuriating. A flake input like this:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;inputs = {
  my-tool.url = &amp;quot;github:myorg&amp;#x2F;private-tool&amp;quot;;
  my-tool.inputs.nixpkgs.follows = &amp;quot;nixpkgs&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;produces this on &lt;code&gt;nix flake update&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;error: unable to download &amp;#x27;https:&amp;#x2F;&amp;#x2F;api.github.com&amp;#x2F;repos&amp;#x2F;myorg&amp;#x2F;private-tool&amp;#x2F;tarball&amp;#x2F;...&amp;#x27;: HTTP error 404
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A 404. Not a 401. Not “authentication required.” Just “not found” — as if the repo doesn’t exist. GitHub returns 404 for unauthenticated requests to private repos, specifically to avoid confirming that the repo exists. It’s a deliberate security choice that sends people down exactly the wrong debugging path — double-checking URLs, triple-checking org names, wondering if someone deleted the repo.&lt;&#x2F;p&gt;
&lt;p&gt;The fix is to tell Nix how to authenticate.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-access-tokens-work&quot;&gt;How access-tokens work&lt;&#x2F;h2&gt;
&lt;p&gt;Nix has a built-in &lt;code&gt;access-tokens&lt;&#x2F;code&gt; setting for exactly this:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;access-tokens = github.com=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;When Nix fetches from &lt;code&gt;github.com&lt;&#x2F;code&gt;, it includes this token in the request headers. Simple key-value — the host on the left, the token on the right. One line handles every private repo on that host.&lt;&#x2F;p&gt;
&lt;p&gt;This setting can live in:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;~&#x2F;.config&#x2F;nix&#x2F;nix.conf&lt;&#x2F;code&gt; — user-level&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;&#x2F;etc&#x2F;nix&#x2F;nix.conf&lt;&#x2F;code&gt; — system-level (for the Nix daemon)&lt;&#x2F;li&gt;
&lt;li&gt;A separate file pulled in via &lt;code&gt;!include&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That third option is the important one. The &lt;code&gt;!include&lt;&#x2F;code&gt; directive lets you keep the token in its own file, managed and rotated independently from &lt;code&gt;nix.conf&lt;&#x2F;code&gt;. You don’t want a long-lived secret sitting in a config file that might be world-readable, committed to a dotfiles repo, or copied around between machines.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;!include &amp;#x2F;path&amp;#x2F;to&amp;#x2F;access-tokens.conf
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The included file contains the &lt;code&gt;access-tokens = ...&lt;&#x2F;code&gt; line. Nix reads it at evaluation time. The token never touches &lt;code&gt;nix.conf&lt;&#x2F;code&gt; itself.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;creating-the-right-token&quot;&gt;Creating the right token&lt;&#x2F;h2&gt;
&lt;p&gt;Before wiring up configuration, you need a token with the right scope.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Fine-grained tokens&lt;&#x2F;strong&gt; (the newer option):&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;GitHub → Settings → Developer settings → Personal access tokens → Fine-grained tokens&lt;&#x2F;li&gt;
&lt;li&gt;Repository access: select the specific private repos your flake needs&lt;&#x2F;li&gt;
&lt;li&gt;Permissions: Contents → Read-only&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That’s the minimum. Nix fetches tarballs from the GitHub API — it needs to read repository contents, nothing else. No write access, no admin, no workflow permissions.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Classic tokens&lt;&#x2F;strong&gt; with &lt;code&gt;repo&lt;&#x2F;code&gt; scope also work but grant far more access than Nix needs. If you’re already using a classic token for other purposes, it’ll work. If you’re creating one specifically for Nix, fine-grained is the better choice.&lt;&#x2F;p&gt;
&lt;p&gt;One token, all repos. If you give it access to &lt;code&gt;myorg&#x2F;tool-a&lt;&#x2F;code&gt;, &lt;code&gt;myorg&#x2F;tool-b&lt;&#x2F;code&gt;, and &lt;code&gt;myorg&#x2F;tool-c&lt;&#x2F;code&gt;, a single &lt;code&gt;access-tokens&lt;&#x2F;code&gt; line authenticates fetches for all of them:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;inputs = {
  tool-a.url = &amp;quot;github:myorg&amp;#x2F;tool-a&amp;quot;;
  tool-b.url = &amp;quot;github:myorg&amp;#x2F;tool-b&amp;quot;;
  tool-c.url = &amp;quot;github:myorg&amp;#x2F;tool-c&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;No per-input configuration needed.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;pattern-1-nixos-with-sops-nix&quot;&gt;Pattern 1: NixOS with sops-nix&lt;&#x2F;h2&gt;
&lt;p&gt;On NixOS, the Nix daemon runs as root and reads system-level configuration. sops-nix handles decryption — encrypted secrets in your repo, decrypted into &lt;code&gt;&#x2F;run&#x2F;secrets&#x2F;&lt;&#x2F;code&gt; during system activation.&lt;&#x2F;p&gt;
&lt;p&gt;Declare the secret and tell Nix to include it:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;sops.secrets.nix_builder_access_tokens = { };

nix.extraOptions = &amp;#x27;&amp;#x27;
  !include ${config.sops.secrets.nix_builder_access_tokens.path}
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The encrypted secrets file (&lt;code&gt;secrets&#x2F;myhost.yaml&lt;&#x2F;code&gt;) contains:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;yaml&quot; class=&quot;language-yaml &quot;&gt;&lt;code class=&quot;language-yaml&quot; data-lang=&quot;yaml&quot;&gt;nix_builder_access_tokens: ENC[AES256_GCM,data:...,type:str]
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The plaintext value is just:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;access-tokens = github.com=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;On activation, sops-nix decrypts the YAML, writes the plaintext to &lt;code&gt;&#x2F;run&#x2F;secrets&#x2F;nix_builder_access_tokens&lt;&#x2F;code&gt;, and Nix picks it up via the &lt;code&gt;!include&lt;&#x2F;code&gt;. The token is never in your nix.conf, never in your repo, and only exists decrypted in a tmpfs mount.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;setting-up-sops-nix&quot;&gt;Setting up sops-nix&lt;&#x2F;h3&gt;
&lt;p&gt;If you haven’t set up sops-nix yet, here’s the short version. &lt;code&gt;.sops.yaml&lt;&#x2F;code&gt; at your repo root maps secrets to age keys:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;yaml&quot; class=&quot;language-yaml &quot;&gt;&lt;code class=&quot;language-yaml&quot; data-lang=&quot;yaml&quot;&gt;keys:
  - &amp;amp;myhost age1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
  - &amp;amp;personal age1yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy

creation_rules:
  - path_regex: secrets&amp;#x2F;myhost\.(yaml|json|env|ini)$
    key_groups:
      - age:
          - *personal
          - *myhost
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Each host’s age key comes from its SSH host key:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;ssh-keyscan myhost.example.com 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null | ssh-to-age
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Create or edit secrets with:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;sops secrets&amp;#x2F;myhost.yaml
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;sops encrypts the file with every key listed in the matching creation rule. The host can decrypt with its own key, and you can decrypt with your personal key.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;pattern-2-macos-darwin-with-activation-scripts&quot;&gt;Pattern 2: macOS&#x2F;Darwin with activation scripts&lt;&#x2F;h2&gt;
&lt;p&gt;macOS doesn’t have NixOS’s activation system, but nix-darwin provides activation scripts that run during &lt;code&gt;darwin-rebuild switch&lt;&#x2F;code&gt;. The approach: decrypt the token with sops-nix, then copy it into the user’s Nix config directory.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;system.activationScripts.postActivation.text = &amp;#x27;&amp;#x27;
  if [ -f &amp;#x2F;run&amp;#x2F;secrets&amp;#x2F;nix_builder_access_tokens ]; then
    USER_NIX_DIR=&amp;quot;&amp;#x2F;Users&amp;#x2F;&amp;#x27;&amp;#x27;${user.username}&amp;#x2F;.config&amp;#x2F;nix&amp;quot;
    mkdir -p &amp;quot;$USER_NIX_DIR&amp;quot;
    cp &amp;#x2F;run&amp;#x2F;secrets&amp;#x2F;nix_builder_access_tokens &amp;quot;$USER_NIX_DIR&amp;#x2F;access-tokens.conf&amp;quot;
    chmod 600 &amp;quot;$USER_NIX_DIR&amp;#x2F;access-tokens.conf&amp;quot;
    chown &amp;#x27;&amp;#x27;${user.username}:staff &amp;quot;$USER_NIX_DIR&amp;#x2F;access-tokens.conf&amp;quot;
    grep -q &amp;#x27;!include&amp;#x27; &amp;quot;$USER_NIX_DIR&amp;#x2F;nix.conf&amp;quot; 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null || \
      echo &amp;#x27;!include access-tokens.conf&amp;#x27; &amp;gt;&amp;gt; &amp;quot;$USER_NIX_DIR&amp;#x2F;nix.conf&amp;quot;
    chown &amp;#x27;&amp;#x27;${user.username}:staff &amp;quot;$USER_NIX_DIR&amp;#x2F;nix.conf&amp;quot;
  fi
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This copies the decrypted token to &lt;code&gt;~&#x2F;.config&#x2F;nix&#x2F;access-tokens.conf&lt;&#x2F;code&gt; with mode 600 (owner-read-only), then ensures &lt;code&gt;nix.conf&lt;&#x2F;code&gt; has the &lt;code&gt;!include&lt;&#x2F;code&gt; line. Idempotent — safe to run repeatedly.&lt;&#x2F;p&gt;
&lt;p&gt;On macOS, Nix typically runs in single-user mode or the daemon reads from the user’s config. Either way, the token ends up where Nix can find it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;remote-builders&quot;&gt;Remote builders&lt;&#x2F;h2&gt;
&lt;p&gt;If you use remote builders, the builder machine needs the access tokens too. It’s the machine doing the fetching — your local Nix sends it a build job, and the builder pulls the flake inputs.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;nix.buildMachines = [
  {
    hostName = &amp;quot;builder.example.com&amp;quot;;
    systems = [ &amp;quot;x86_64-linux&amp;quot; &amp;quot;aarch64-linux&amp;quot; ];
    protocol = &amp;quot;ssh-ng&amp;quot;;
    sshUser = &amp;quot;root&amp;quot;;
    sshKey = &amp;quot;&amp;#x2F;etc&amp;#x2F;nix&amp;#x2F;builder_ed25519&amp;quot;;
    maxJobs = 64;
    speedFactor = 2;
    supportedFeatures = [ &amp;quot;nixos-test&amp;quot; &amp;quot;benchmark&amp;quot; &amp;quot;big-parallel&amp;quot; &amp;quot;kvm&amp;quot; ];
  }
];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Same &lt;code&gt;!include&lt;&#x2F;code&gt; pattern on the builder host. If the builder doesn’t have the token, it hits the same 404 when it tries to fetch your private inputs. There’s nothing special about how remote builders authenticate — they just need the same &lt;code&gt;access-tokens&lt;&#x2F;code&gt; configuration as any other Nix installation that fetches from private repos.&lt;&#x2F;p&gt;
&lt;p&gt;The builder’s SSH key can also be managed via sops-nix on the machine that initiates the connection.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-bootstrap-problem&quot;&gt;The bootstrap problem&lt;&#x2F;h2&gt;
&lt;p&gt;There’s a chicken-and-egg issue with fresh hosts. A newly installed NixOS machine needs access tokens to build its own configuration (because the flake has private inputs), but the tokens are in sops-nix secrets that only get deployed &lt;em&gt;after&lt;&#x2F;em&gt; a successful build.&lt;&#x2F;p&gt;
&lt;p&gt;The solution: build on an existing host that already has tokens, deploy the result to the new host.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run nixpkgs#nixos-rebuild -- switch \
  --flake .#myhost \
  --target-host root@myhost.example.com \
  --build-host root@builder.example.com
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The builder fetches the private inputs with its own tokens, builds the system closure, and ships it to the target. After activation, sops-nix decrypts the tokens on the new host, and from that point forward it can build its own config.&lt;&#x2F;p&gt;
&lt;p&gt;This is a one-time problem per host. I wrote a &lt;a href=&quot;&#x2F;blog&#x2F;sops-bootstrap-problem&#x2F;&quot;&gt;full post on the bootstrap dance&lt;&#x2F;a&gt; if you want the details.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;common-mistakes&quot;&gt;Common mistakes&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;Token in nix.conf directly.&lt;&#x2F;strong&gt; Tempting for a quick test, dangerous as a habit. &lt;code&gt;&#x2F;etc&#x2F;nix&#x2F;nix.conf&lt;&#x2F;code&gt; is often world-readable. Use &lt;code&gt;!include&lt;&#x2F;code&gt; and a separate file with restrictive permissions.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Using netrc instead of access-tokens.&lt;&#x2F;strong&gt; Nix supports &lt;code&gt;~&#x2F;.config&#x2F;nix&#x2F;netrc&lt;&#x2F;code&gt; for HTTP authentication, and it works. But &lt;code&gt;access-tokens&lt;&#x2F;code&gt; is purpose-built for this — it’s simpler, specific to code forges, and doesn’t conflict with other tools that read netrc files.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Forgetting the Nix daemon.&lt;&#x2F;strong&gt; On multi-user Nix installs (the default on NixOS and recommended on macOS), the Nix daemon does the fetching. If you put the token in your user config but the daemon reads from &lt;code&gt;&#x2F;etc&#x2F;nix&#x2F;nix.conf&lt;&#x2F;code&gt;, the daemon never sees it. On NixOS, &lt;code&gt;nix.extraOptions&lt;&#x2F;code&gt; writes to the system-level config. On macOS, make sure the daemon’s config includes the token, not just your user config.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Fine-grained token without Contents permission.&lt;&#x2F;strong&gt; The GitHub fine-grained token UI has a lot of permission knobs. If you skip Contents read or leave it at “No access,” Nix gets the same 404. It’s not a different error — you just don’t have permission to see the tarball.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Token expired or revoked.&lt;&#x2F;strong&gt; Fine-grained tokens have expiration dates. If your flake worked last week and doesn’t today, check the token. GitHub doesn’t send you a reminder email before it expires.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-minimal-setup&quot;&gt;The minimal setup&lt;&#x2F;h2&gt;
&lt;p&gt;If you just want it working and you’ll worry about secrets management later, the fastest path:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Create a fine-grained token with Contents read on your private repos&lt;&#x2F;li&gt;
&lt;li&gt;Add one line to &lt;code&gt;~&#x2F;.config&#x2F;nix&#x2F;nix.conf&lt;&#x2F;code&gt;:&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;pre&gt;&lt;code&gt;access-tokens = github.com=ghp_xxxxxxxxxxxxxxxxxxxxxxxxxxxx
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;ol start=&quot;3&quot;&gt;
&lt;li&gt;Run &lt;code&gt;nix flake update&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;That gets you unblocked. Then move the token into &lt;code&gt;!include&lt;&#x2F;code&gt; and sops-nix at your own pace. The sops-nix setup is the right long-term answer — encrypted at rest, automatically deployed, rotatable — but it’s not a prerequisite for fetching private flake inputs. One line in your Nix config is all that’s strictly required.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>writeShellScriptBin — Why Every Nix Flake Script Needs an Explicit Shell</title>
        <published>2026-03-22T02:00:00+00:00</published>
        <updated>2026-03-22T02:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/writeshellscriptbin-explicit-shell/"/>
        <id>https://perlpimp.net/blog/writeshellscriptbin-explicit-shell/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/writeshellscriptbin-explicit-shell/">&lt;p&gt;The devShell looks clean. A &lt;code&gt;shellHook&lt;&#x2F;code&gt; that defines a &lt;code&gt;build&lt;&#x2F;code&gt; function, a &lt;code&gt;serve&lt;&#x2F;code&gt; helper, maybe some environment setup. It works perfectly — on your machine. A colleague pulls, runs &lt;code&gt;nix develop&lt;&#x2F;code&gt;, types &lt;code&gt;build&lt;&#x2F;code&gt;, and gets:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;fish: Unknown command &amp;#x27;build()&amp;#x27;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;They’re on fish. You’re on bash. Your shellHook is bash syntax. And &lt;code&gt;nix develop&lt;&#x2F;code&gt; just handed it to their shell verbatim.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-inheritance-problem&quot;&gt;The inheritance problem&lt;&#x2F;h2&gt;
&lt;p&gt;When you enter a devShell with &lt;code&gt;nix develop&lt;&#x2F;code&gt;, Nix doesn’t spawn bash. It spawns &lt;em&gt;your&lt;&#x2F;em&gt; shell — whatever &lt;code&gt;$SHELL&lt;&#x2F;code&gt; points to. The &lt;code&gt;shellHook&lt;&#x2F;code&gt; runs inside that shell. If your hook uses bash syntax and the user runs fish, zsh, or nushell, anything beyond trivial &lt;code&gt;export&lt;&#x2F;code&gt; statements will break.&lt;&#x2F;p&gt;
&lt;p&gt;This isn’t a bug. It’s how &lt;code&gt;mkShell&lt;&#x2F;code&gt; works. The shellHook is shell code with no shebang — it runs in whatever interpreter shows up.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-ways-it-breaks&quot;&gt;The ways it breaks&lt;&#x2F;h2&gt;
&lt;p&gt;The list of bash-isms that fail in fish is longer than you’d expect:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;set -euo pipefail&lt;&#x2F;code&gt; — fish error: &lt;code&gt;set: Unknown option &#x27;-e&#x27;&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;[[ ... ]]&lt;&#x2F;code&gt; double-bracket tests — fish error: &lt;code&gt;Unknown command &#x27;[[&#x27;&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;if [ -z &quot;$VAR&quot; ]; then ... fi&lt;&#x2F;code&gt; — fish uses &lt;code&gt;if test -z &quot;$VAR&quot;; ... end&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;for f in *.txt; do ... done&lt;&#x2F;code&gt; — fish uses &lt;code&gt;for f in *.txt; ... end&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Array syntax &lt;code&gt;arr=(a b c)&lt;&#x2F;code&gt; — fish uses &lt;code&gt;set arr a b c&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Process substitution &lt;code&gt;&amp;lt;(cmd)&lt;&#x2F;code&gt; — doesn’t exist in fish&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;source script.sh&lt;&#x2F;code&gt; — fish tries to interpret the bash syntax and fails&lt;&#x2F;li&gt;
&lt;li&gt;Function definitions &lt;code&gt;build() { ... }&lt;&#x2F;code&gt; — fish uses &lt;code&gt;function build; ... end&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Basically everything structural. Variables and &lt;code&gt;export&lt;&#x2F;code&gt; work. Control flow, functions, and error handling don’t.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s what a typical broken shellHook looks like:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;devShells.default = pkgs.mkShell {
  shellHook = &amp;#x27;&amp;#x27;
    build() {
      set -euo pipefail
      echo &amp;quot;Building...&amp;quot;
      ${pkgs.zola}&amp;#x2F;bin&amp;#x2F;zola build
    }
  &amp;#x27;&amp;#x27;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Bash user gets a working &lt;code&gt;build&lt;&#x2F;code&gt; function. Fish user gets a wall of syntax errors on shell entry. They’ll either switch to bash, delete the shellHook, or stop using your devShell — none of which are what you wanted.&lt;&#x2F;p&gt;
&lt;p&gt;And it’s not just shellHook. If you put a &lt;code&gt;scripts&#x2F;build.sh&lt;&#x2F;code&gt; in the repo, add it to PATH, and forget the shebang, the user’s shell interprets it directly. Same problem, less obvious cause.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-fix-writeshellscriptbin&quot;&gt;The fix: writeShellScriptBin&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;code&gt;writeShellScriptBin&lt;&#x2F;code&gt; creates a standalone executable in the Nix store with an explicit bash shebang:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;serve = pkgs.writeShellScriptBin &amp;quot;serve&amp;quot; &amp;#x27;&amp;#x27;
  echo &amp;quot;Serving at http:&amp;#x2F;&amp;#x2F;localhost:1111&amp;quot;
  ${pkgs.zola}&amp;#x2F;bin&amp;#x2F;zola serve --port 1111
&amp;#x27;&amp;#x27;;

build = pkgs.writeShellScriptBin &amp;quot;build&amp;quot; &amp;#x27;&amp;#x27;
  set -euo pipefail
  echo &amp;quot;Building site...&amp;quot;
  ${pkgs.zola}&amp;#x2F;bin&amp;#x2F;zola build
  echo &amp;quot;Done. Output in public&amp;#x2F;&amp;quot;
&amp;#x27;&amp;#x27;;

devShells.default = pkgs.mkShell {
  packages = [ serve build ];
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Each script becomes a real executable at &lt;code&gt;&#x2F;nix&#x2F;store&#x2F;...-serve&#x2F;bin&#x2F;serve&lt;&#x2F;code&gt;. The generated wrapper starts with &lt;code&gt;#!&#x2F;nix&#x2F;store&#x2F;...&#x2F;bin&#x2F;bash&lt;&#x2F;code&gt; — not &lt;code&gt;#!&#x2F;bin&#x2F;bash&lt;&#x2F;code&gt;, which doesn’t exist on NixOS. The only thing in &lt;code&gt;&#x2F;bin&lt;&#x2F;code&gt; is &lt;code&gt;sh&lt;&#x2F;code&gt;, for POSIX compatibility. You could use &lt;code&gt;#!&#x2F;usr&#x2F;bin&#x2F;env bash&lt;&#x2F;code&gt;, but that depends on &lt;code&gt;bash&lt;&#x2F;code&gt; being in PATH at execution time. The Nix store path sidesteps both problems — it’s a direct, absolute reference to a specific bash derivation. When the user types &lt;code&gt;serve&lt;&#x2F;code&gt;, their shell finds the executable in PATH, sees the shebang, and hands it off to bash. Doesn’t matter if they’re running fish, zsh, nushell, or something they wrote themselves last weekend. The script always executes under bash.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;full-path-dependencies&quot;&gt;Full-path dependencies&lt;&#x2F;h2&gt;
&lt;p&gt;Notice the &lt;code&gt;${pkgs.zola}&#x2F;bin&#x2F;zola&lt;&#x2F;code&gt; in the examples above. That Nix interpolation expands at build time to something like &lt;code&gt;&#x2F;nix&#x2F;store&#x2F;abc123-zola-0.22.0&#x2F;bin&#x2F;zola&lt;&#x2F;code&gt;. The script doesn’t rely on &lt;code&gt;zola&lt;&#x2F;code&gt; being in PATH — it contains the absolute store path to the exact binary.&lt;&#x2F;p&gt;
&lt;p&gt;This matters. A script that calls bare &lt;code&gt;zola&lt;&#x2F;code&gt; works inside &lt;code&gt;nix develop&lt;&#x2F;code&gt; (where the devShell puts zola in PATH) but fails if someone runs it outside the shell, or if the PATH gets modified. Full store paths make the script self-contained. It works anywhere, any time, regardless of environment.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;build = pkgs.writeShellScriptBin &amp;quot;build&amp;quot; &amp;#x27;&amp;#x27;
  set -euo pipefail
  TYPST_FONT_PATHS=&amp;quot;${pkgs.inter}&amp;#x2F;share&amp;#x2F;fonts&amp;quot; \
    ${pkgs.typst}&amp;#x2F;bin&amp;#x2F;typst compile cv-template.typ static&amp;#x2F;cv.pdf
  ${pkgs.zola}&amp;#x2F;bin&amp;#x2F;zola build
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Every dependency is pinned to a specific Nix store path. The script’s closure captures exactly what it needs. This is the same property that makes NixOS system configurations reproducible — applied to your little helper scripts.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-shellhook-should-actually-do&quot;&gt;What shellHook should actually do&lt;&#x2F;h2&gt;
&lt;p&gt;Keep shellHook minimal. It should only do things that are genuinely shell-agnostic:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Setting environment variables&lt;&#x2F;strong&gt; — &lt;code&gt;export&lt;&#x2F;code&gt; works in bash, zsh, and modern fish&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Printing a welcome message&lt;&#x2F;strong&gt; — &lt;code&gt;echo&lt;&#x2F;code&gt; is universal&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Setting a prompt hint&lt;&#x2F;strong&gt; — if you must&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;That’s about it. Anything with control flow, function definitions, or bash-specific syntax belongs in a &lt;code&gt;writeShellScriptBin&lt;&#x2F;code&gt; package. The boundary is clear: if it would break in fish, it doesn’t belong in shellHook.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;devShells.default = pkgs.mkShell {
  packages = [ serve build build-pdf ];

  shellHook = &amp;#x27;&amp;#x27;
    export PROJECT_ROOT=&amp;quot;$(pwd)&amp;quot;
    echo &amp;quot;Commands: serve, build, build-pdf&amp;quot;
  &amp;#x27;&amp;#x27;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Environment setup in the hook, real logic in wrapped scripts. Your fish users stop filing issues, your bash users notice no difference.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;writeshellapplication-the-stricter-variant&quot;&gt;writeShellApplication — the stricter variant&lt;&#x2F;h2&gt;
&lt;p&gt;If you’re already wrapping scripts with &lt;code&gt;writeShellScriptBin&lt;&#x2F;code&gt;, consider &lt;code&gt;writeShellApplication&lt;&#x2F;code&gt; instead. It does three things on top:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Adds &lt;code&gt;set -euo pipefail&lt;&#x2F;code&gt; automatically&lt;&#x2F;strong&gt; — you don’t need to remember it&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Runs shellcheck at build time&lt;&#x2F;strong&gt; — catches bugs before the script ever executes&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Puts dependencies in &lt;code&gt;runtimeInputs&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — added to PATH, so you can use bare command names&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;build = pkgs.writeShellApplication {
  name = &amp;quot;build&amp;quot;;
  runtimeInputs = [ pkgs.zola pkgs.typst ];
  text = &amp;#x27;&amp;#x27;
    echo &amp;quot;Building site...&amp;quot;
    TYPST_FONT_PATHS=&amp;quot;${pkgs.inter}&amp;#x2F;share&amp;#x2F;fonts&amp;quot; \
      typst compile cv-template.typ static&amp;#x2F;cv.pdf
    zola build
  &amp;#x27;&amp;#x27;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The trade-off: &lt;code&gt;runtimeInputs&lt;&#x2F;code&gt; adds dependencies to PATH at runtime rather than using full store paths. This is slightly less hermetic — the script depends on PATH being set up correctly — but it’s more readable and shellcheck can actually lint the command names. For project devShell scripts where readability matters more than absolute hermeticity, this is usually the right call.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;writeShellApplication&lt;&#x2F;code&gt; also produces a slightly different output structure — the result is a derivation with &lt;code&gt;bin&#x2F;&amp;lt;name&amp;gt;&lt;&#x2F;code&gt; like &lt;code&gt;writeShellScriptBin&lt;&#x2F;code&gt;, but the build process includes the shellcheck step. If shellcheck finds issues, the build fails. You’ll discover that your &lt;code&gt;$variable&lt;&#x2F;code&gt; should be &lt;code&gt;&quot;$variable&quot;&lt;&#x2F;code&gt; at &lt;code&gt;nix build&lt;&#x2F;code&gt; time instead of at 2 AM in production.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-to-use-which&quot;&gt;When to use which&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;writeShellScriptBin&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — when you want full control. You manage &lt;code&gt;set -euo pipefail&lt;&#x2F;code&gt; yourself, use full Nix store paths for dependencies, and don’t want shellcheck opinions about your quoting.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;writeShellApplication&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — when you want guardrails. Automatic strict mode, shellcheck enforcement, cleaner dependency declaration. This is the default choice for most devShell scripts.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;shellHook&lt;&#x2F;strong&gt; — only for environment variables and welcome messages. Nothing else.&lt;&#x2F;p&gt;
&lt;p&gt;The underlying principle is the same for both: an explicit shebang pointing at a Nix store bash, guaranteeing your script runs under the shell you wrote it for. Everything else is ergonomics.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Distributing a Private CLI via Homebrew with Nix Cross-Compilation</title>
        <published>2026-03-21T12:00:00+00:00</published>
        <updated>2026-03-21T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/private-homebrew-tap-nix/"/>
        <id>https://perlpimp.net/blog/private-homebrew-tap-nix/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/private-homebrew-tap-nix/">&lt;p&gt;You have a private CLI tool. It’s built with Nix, produces &lt;a href=&quot;&#x2F;blog&#x2F;portable-rust-binaries-nix&#x2F;&quot;&gt;static binaries for four platforms&lt;&#x2F;a&gt;, and works beautifully — on your machine. Now your teammates want it, and they don’t have Nix.&lt;&#x2F;p&gt;
&lt;p&gt;What you want is &lt;code&gt;brew install my-cli&lt;&#x2F;code&gt;. One command, any Mac or Linux box, no Nix required. But the source repo is private. Homebrew can’t authenticate against it. You can’t just point a formula at your GitHub Releases and call it a day.&lt;&#x2F;p&gt;
&lt;p&gt;This post covers the pattern: a private Homebrew tap repo, Nix cross-compilation, and a release script that builds, uploads, and generates the formula in one shot.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-auth-problem&quot;&gt;The auth problem&lt;&#x2F;h2&gt;
&lt;p&gt;Homebrew formulas download tarballs via HTTPS. When you &lt;code&gt;brew install something&lt;&#x2F;code&gt;, Homebrew fetches the URL in the formula’s &lt;code&gt;url&lt;&#x2F;code&gt; field — no authentication, no SSH keys, just a plain HTTP GET. If your source repo is private, that GET returns a 404.&lt;&#x2F;p&gt;
&lt;p&gt;This is the fundamental constraint. The formula needs to download binaries from somewhere Homebrew can reach.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-tap-repo-trick&quot;&gt;The tap repo trick&lt;&#x2F;h2&gt;
&lt;p&gt;The solution is a separate repo — a &lt;em&gt;tap&lt;&#x2F;em&gt; — that’s also private but becomes accessible once a user runs &lt;code&gt;brew tap&lt;&#x2F;code&gt;. Here’s how it works:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;You create a private repo named &lt;code&gt;homebrew-tap&lt;&#x2F;code&gt; (or &lt;code&gt;homebrew-tools&lt;&#x2F;code&gt;, &lt;code&gt;homebrew-internal&lt;&#x2F;code&gt; — the &lt;code&gt;homebrew-&lt;&#x2F;code&gt; prefix is what matters)&lt;&#x2F;li&gt;
&lt;li&gt;Your teammates run &lt;code&gt;brew tap myorg&#x2F;tap&lt;&#x2F;code&gt;, which clones &lt;code&gt;github.com&#x2F;myorg&#x2F;homebrew-tap&lt;&#x2F;code&gt; via SSH&lt;&#x2F;li&gt;
&lt;li&gt;Once tapped, Homebrew can access that repo’s GitHub Releases via authenticated HTTPS&lt;&#x2F;li&gt;
&lt;li&gt;Your formula points to release assets on the &lt;em&gt;tap&lt;&#x2F;em&gt; repo, not the source repo&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The naming convention is load-bearing. Homebrew automatically prepends &lt;code&gt;homebrew-&lt;&#x2F;code&gt; to the short name and assumes GitHub:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;brew tap myorg&amp;#x2F;tap
# → clones github.com&amp;#x2F;myorg&amp;#x2F;homebrew-tap via SSH
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;No full URL needed. If your teammates have SSH keys configured for GitHub (they do — they’re developers), this just works. For HTTPS-only setups, &lt;code&gt;HOMEBREW_GITHUB_API_TOKEN&lt;&#x2F;code&gt; handles auth.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;tap-repo-structure&quot;&gt;Tap repo structure&lt;&#x2F;h2&gt;
&lt;p&gt;Minimal. One directory, one file:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;homebrew-tap&amp;#x2F;
├── Formula&amp;#x2F;
│   └── my-cli.rb
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That’s it. No Brewfile, no Casks, no scripts. The release assets live in GitHub Releases on this same repo.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-formula&quot;&gt;The formula&lt;&#x2F;h2&gt;
&lt;p&gt;A Homebrew formula is a Ruby class that tells &lt;code&gt;brew&lt;&#x2F;code&gt; where to download binaries and how to install them. For a multi-platform CLI:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;ruby&quot; class=&quot;language-ruby &quot;&gt;&lt;code class=&quot;language-ruby&quot; data-lang=&quot;ruby&quot;&gt;class MyCli &amp;lt; Formula
  desc &amp;quot;Description of your CLI tool&amp;quot;
  homepage &amp;quot;https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;myorg&amp;#x2F;homebrew-tap&amp;quot;
  version &amp;quot;0.1.0&amp;quot;
  license &amp;quot;Proprietary&amp;quot;

  on_macos do
    if Hardware::CPU.arm?
      url &amp;quot;https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;myorg&amp;#x2F;homebrew-tap&amp;#x2F;releases&amp;#x2F;download&amp;#x2F;my-cli-v0.1.0&amp;#x2F;my-cli-0.1.0-darwin-arm64.tar.gz&amp;quot;
      sha256 &amp;quot;DARWIN_ARM64_SHA256&amp;quot;
    else
      url &amp;quot;https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;myorg&amp;#x2F;homebrew-tap&amp;#x2F;releases&amp;#x2F;download&amp;#x2F;my-cli-v0.1.0&amp;#x2F;my-cli-0.1.0-darwin-amd64.tar.gz&amp;quot;
      sha256 &amp;quot;DARWIN_AMD64_SHA256&amp;quot;
    end
  end

  on_linux do
    if Hardware::CPU.arm?
      url &amp;quot;https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;myorg&amp;#x2F;homebrew-tap&amp;#x2F;releases&amp;#x2F;download&amp;#x2F;my-cli-v0.1.0&amp;#x2F;my-cli-0.1.0-linux-arm64.tar.gz&amp;quot;
      sha256 &amp;quot;LINUX_ARM64_SHA256&amp;quot;
    else
      url &amp;quot;https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;myorg&amp;#x2F;homebrew-tap&amp;#x2F;releases&amp;#x2F;download&amp;#x2F;my-cli-v0.1.0&amp;#x2F;my-cli-0.1.0-linux-amd64.tar.gz&amp;quot;
      sha256 &amp;quot;LINUX_AMD64_SHA256&amp;quot;
    end
  end

  def install
    bin.install &amp;quot;my-cli&amp;quot;
  end

  test do
    assert_match &amp;quot;my-cli&amp;quot;, shell_output(&amp;quot;#{bin}&amp;#x2F;my-cli --version&amp;quot;)
  end
end
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Four platform blocks, four sha256 hashes, one &lt;code&gt;bin.install&lt;&#x2F;code&gt;. The URLs all point to the tap repo’s releases — not the source repo.&lt;&#x2F;p&gt;
&lt;p&gt;The &lt;code&gt;test&lt;&#x2F;code&gt; block is optional but good practice. &lt;code&gt;brew test my-cli&lt;&#x2F;code&gt; will run it after install.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-release-script&quot;&gt;The release script&lt;&#x2F;h2&gt;
&lt;p&gt;You don’t want to build four tarballs, compute four hashes, upload them, edit the formula, commit, and push — by hand. That’s the kind of process you do correctly twice and then fumble on the third.&lt;&#x2F;p&gt;
&lt;p&gt;Instead, add a &lt;code&gt;homebrew-release&lt;&#x2F;code&gt; package to your flake. It’s a shell script with Nix-provided dependencies:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;homebrew-release = let
  runtimeDeps = with pkgs; [ gh coreutils gnutar gzip git openssh jq ];
in pkgs.writeShellScriptBin &amp;quot;homebrew-release&amp;quot; &amp;#x27;&amp;#x27;
  export PATH=&amp;quot;${pkgs.lib.makeBinPath runtimeDeps}:$PATH&amp;quot;
  set -euo pipefail

  GH_OWNER=&amp;quot;myorg&amp;quot;
  GH_REPO=&amp;quot;my-cli&amp;quot;
  GH_TAP_REPO=&amp;quot;homebrew-tap&amp;quot;
  PACKAGE=&amp;quot;my-cli&amp;quot;
  TAP_REPO=&amp;quot;git@github.com:$GH_OWNER&amp;#x2F;$GH_TAP_REPO.git&amp;quot;

  if ! gh auth status &amp;amp;&amp;gt;&amp;#x2F;dev&amp;#x2F;null; then
    echo &amp;quot;Error: gh CLI is not authenticated. Run: gh auth login&amp;quot; &amp;gt;&amp;amp;2
    exit 1
  fi

  VERSION=$(grep &amp;#x27;^version&amp;#x27; cli&amp;#x2F;Cargo.toml | head -1 | sed &amp;#x27;s&amp;#x2F;.*&amp;quot;\(.*\)&amp;quot;&amp;#x2F;\1&amp;#x2F;&amp;#x27;)
  TAG=&amp;quot;$PACKAGE-v$VERSION&amp;quot;
  echo &amp;quot;Version: $VERSION (tag: $TAG)&amp;quot;

  WORK=$(mktemp -d)
  trap &amp;#x27;rm -rf &amp;quot;$WORK&amp;quot;&amp;#x27; EXIT

  # Clone tap repo for existing sha256 values
  TAP_DIR=&amp;quot;$WORK&amp;#x2F;tap&amp;quot;
  git clone &amp;quot;$TAP_REPO&amp;quot; &amp;quot;$TAP_DIR&amp;quot; 2&amp;gt;&amp;amp;1 | grep -v &amp;quot;^warning:&amp;quot; || true
  mkdir -p &amp;quot;$TAP_DIR&amp;#x2F;Formula&amp;quot;
  FORMULA=&amp;quot;$TAP_DIR&amp;#x2F;Formula&amp;#x2F;my-cli.rb&amp;quot;

  existing_sha() {
    local plat=$1
    if [ -f &amp;quot;$FORMULA&amp;quot; ]; then
      sed -n &amp;quot;&amp;#x2F;$plat\.tar\.gz&amp;#x2F;{n;s&amp;#x2F;.*sha256 \&amp;quot;\([^\&amp;quot;]*\)\&amp;quot;.*&amp;#x2F;\1&amp;#x2F;p;}&amp;quot; &amp;quot;$FORMULA&amp;quot;
    fi
  }

  # Create or reuse GitHub release on the tap repo
  if gh release view &amp;quot;$TAG&amp;quot; --repo &amp;quot;$GH_OWNER&amp;#x2F;$GH_TAP_REPO&amp;quot; &amp;amp;&amp;gt;&amp;#x2F;dev&amp;#x2F;null; then
    echo &amp;quot;Release $TAG already exists.&amp;quot;
  else
    gh release create &amp;quot;$TAG&amp;quot; \
      --repo &amp;quot;$GH_OWNER&amp;#x2F;$GH_TAP_REPO&amp;quot; \
      --title &amp;quot;$PACKAGE $TAG&amp;quot; \
      --notes &amp;quot;$PACKAGE release $VERSION&amp;quot;
  fi

  EXISTING_ASSETS=$(gh release view &amp;quot;$TAG&amp;quot; \
    --repo &amp;quot;$GH_OWNER&amp;#x2F;$GH_TAP_REPO&amp;quot; \
    --json assets -q &amp;#x27;.assets[].name&amp;#x27; 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null || true)

  DARWIN_ARM64_SHA=&amp;quot;PLACEHOLDER&amp;quot;
  DARWIN_AMD64_SHA=&amp;quot;PLACEHOLDER&amp;quot;
  LINUX_ARM64_SHA=&amp;quot;PLACEHOLDER&amp;quot;
  LINUX_AMD64_SHA=&amp;quot;PLACEHOLDER&amp;quot;

  PLATFORMS=&amp;quot;darwin-arm64:aarch64-darwin darwin-amd64:x86_64-darwin linux-arm64:aarch64-linux linux-amd64:x86_64-linux&amp;quot;

  for ENTRY in $PLATFORMS; do
    PLAT=&amp;quot;&amp;#x27;&amp;#x27;${ENTRY%%:*}&amp;quot;
    NIX_SYS=&amp;quot;&amp;#x27;&amp;#x27;${ENTRY##*:}&amp;quot;
    FILENAME=&amp;quot;$PACKAGE-$VERSION-$PLAT.tar.gz&amp;quot;

    if echo &amp;quot;$EXISTING_ASSETS&amp;quot; | grep -qF &amp;quot;$FILENAME&amp;quot;; then
      SHA=$(existing_sha &amp;quot;$PLAT&amp;quot;)
      if [ -n &amp;quot;$SHA&amp;quot; ] &amp;amp;&amp;amp; [ &amp;quot;$SHA&amp;quot; != &amp;quot;PLACEHOLDER&amp;quot; ]; then
        echo &amp;quot;[$PLAT] Already published (sha256: &amp;#x27;&amp;#x27;${SHA:0:16}...)&amp;quot;
      else
        echo &amp;quot;[$PLAT] Already published, fetching sha256...&amp;quot;
        gh release download &amp;quot;$TAG&amp;quot; \
          --repo &amp;quot;$GH_OWNER&amp;#x2F;$GH_TAP_REPO&amp;quot; \
          --pattern &amp;quot;$FILENAME&amp;quot; \
          --dir &amp;quot;$WORK&amp;quot;
        SHA=$(sha256sum &amp;quot;$WORK&amp;#x2F;$FILENAME&amp;quot; | cut -d&amp;#x27; &amp;#x27; -f1)
      fi
    else
      echo &amp;quot;[$PLAT] Building .#packages.$NIX_SYS.my-cli-static...&amp;quot;
      OUT=$(nix build &amp;quot;.#packages.$NIX_SYS.my-cli-static&amp;quot; --no-link --print-out-paths)

      BDIR=&amp;quot;$WORK&amp;#x2F;build-$PLAT&amp;quot;
      mkdir -p &amp;quot;$BDIR&amp;quot;
      cp &amp;quot;$OUT&amp;#x2F;bin&amp;#x2F;my-cli&amp;quot; &amp;quot;$BDIR&amp;#x2F;&amp;quot;
      TARBALL=&amp;quot;$WORK&amp;#x2F;$FILENAME&amp;quot;
      tar czf &amp;quot;$TARBALL&amp;quot; -C &amp;quot;$BDIR&amp;quot; my-cli

      SHA=$(sha256sum &amp;quot;$TARBALL&amp;quot; | cut -d&amp;#x27; &amp;#x27; -f1)
      echo &amp;quot;[$PLAT] sha256: &amp;#x27;&amp;#x27;${SHA:0:16}...&amp;quot;

      echo &amp;quot;[$PLAT] Uploading to release...&amp;quot;
      gh release upload &amp;quot;$TAG&amp;quot; &amp;quot;$TARBALL&amp;quot; \
        --repo &amp;quot;$GH_OWNER&amp;#x2F;$GH_TAP_REPO&amp;quot; \
        --clobber
    fi

    case &amp;quot;$PLAT&amp;quot; in
      darwin-arm64) DARWIN_ARM64_SHA=&amp;quot;$SHA&amp;quot; ;;
      darwin-amd64) DARWIN_AMD64_SHA=&amp;quot;$SHA&amp;quot; ;;
      linux-arm64)  LINUX_ARM64_SHA=&amp;quot;$SHA&amp;quot; ;;
      linux-amd64)  LINUX_AMD64_SHA=&amp;quot;$SHA&amp;quot; ;;
    esac
  done

  # Generate the formula with real sha256 values
  # (template omitted for brevity — same structure as above,
  # with $DARWIN_ARM64_SHA etc. interpolated)

  cd &amp;quot;$TAP_DIR&amp;quot;
  git add Formula&amp;#x2F;my-cli.rb
  if git diff --cached --quiet; then
    echo &amp;quot;Formula already up to date.&amp;quot;
  else
    git commit -m &amp;quot;Update my-cli to $VERSION&amp;quot;
    git push -u origin HEAD
    echo &amp;quot;Tap updated.&amp;quot;
  fi
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then expose it in your flake outputs:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;packages.homebrew-release = homebrew-release;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A few things worth calling out:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Incremental by design.&lt;&#x2F;strong&gt; If a platform’s tarball is already uploaded, the script reads its sha256 from the existing formula and skips the build. You can resume after a partial failure — say, if the linux-amd64 remote builder was down — without re-uploading what’s already done.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;gh&lt;&#x2F;code&gt; for everything.&lt;&#x2F;strong&gt; Creates releases, uploads assets, downloads for sha256 recovery. Must be authenticated via &lt;code&gt;gh auth login&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Clones the tap to a temp dir.&lt;&#x2F;strong&gt; Reads existing sha256 values, writes the updated formula, commits, pushes. The temp dir gets cleaned up on exit.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;the-release-workflow&quot;&gt;The release workflow&lt;&#x2F;h2&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# 1. Bump version in cli&amp;#x2F;Cargo.toml
# 2. Run the release script
nix run .#homebrew-release
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;That’s it. The script:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Reads the version from &lt;code&gt;Cargo.toml&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Creates a GitHub Release tagged &lt;code&gt;my-cli-v0.1.0&lt;&#x2F;code&gt; on the tap repo&lt;&#x2F;li&gt;
&lt;li&gt;Builds static binaries for all four platforms via &lt;code&gt;nix build .#packages.$system.my-cli-static&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Tars them up, computes sha256 hashes, uploads as release assets&lt;&#x2F;li&gt;
&lt;li&gt;Generates the formula with real hashes, commits, and pushes to the tap repo&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Your teammates run &lt;code&gt;brew upgrade my-cli&lt;&#x2F;code&gt; and get the new version. No Nix, no Docker, no “download this tarball and put it in your PATH.”&lt;&#x2F;p&gt;
&lt;h2 id=&quot;user-installation&quot;&gt;User installation&lt;&#x2F;h2&gt;
&lt;p&gt;From the user’s perspective — the person who just wants the CLI:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# One-time setup
brew tap myorg&amp;#x2F;tap

# Install
brew install my-cli

# Later
brew upgrade my-cli
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Three commands total, one of which they do once. If they have SSH access to the tap repo, it just works. That’s the whole point.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;public-vs-private-source-repos&quot;&gt;Public vs. private source repos&lt;&#x2F;h2&gt;
&lt;p&gt;If your source repo is public, you don’t need the tap repo trick at all. Point the formula directly at the source repo’s GitHub Releases:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;ruby&quot; class=&quot;language-ruby &quot;&gt;&lt;code class=&quot;language-ruby&quot; data-lang=&quot;ruby&quot;&gt;url &amp;quot;https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;myorg&amp;#x2F;my-cli&amp;#x2F;releases&amp;#x2F;download&amp;#x2F;v0.1.0&amp;#x2F;my-cli-0.1.0-darwin-arm64.tar.gz&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Homebrew can fetch from public repos without auth. You still need a tap repo for the formula itself (unless you’re submitting to homebrew-core, which requires open source), but the binaries can live on the source repo.&lt;&#x2F;p&gt;
&lt;p&gt;The whole re-hosting dance — uploading binaries to the tap repo’s releases — only matters when the source repo is private and Homebrew can’t reach it. If that constraint goes away, simplify.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;&#x2F;h2&gt;
&lt;ul&gt;
&lt;li&gt;A Nix flake that produces static binaries for four platforms — see &lt;a href=&quot;&#x2F;blog&#x2F;portable-rust-binaries-nix&#x2F;&quot;&gt;Building Portable Rust CLI Binaries with Nix&lt;&#x2F;a&gt; for the full pattern&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;gh&lt;&#x2F;code&gt; CLI installed and authenticated&lt;&#x2F;li&gt;
&lt;li&gt;Remote builders configured for cross-platform builds (Linux from Mac, or vice versa)&lt;&#x2F;li&gt;
&lt;li&gt;A &lt;code&gt;homebrew-tap&lt;&#x2F;code&gt; repo created on GitHub (private, with the &lt;code&gt;homebrew-&lt;&#x2F;code&gt; prefix)&lt;&#x2F;li&gt;
&lt;li&gt;SSH access to the tap repo (or &lt;code&gt;HOMEBREW_GITHUB_API_TOKEN&lt;&#x2F;code&gt; for HTTPS)&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The Nix cross-compilation and Homebrew tap are independent problems that happen to compose well. Nix gives you reproducible, hermetic builds across four targets. Homebrew gives you a distribution channel that meets developers where they already are. The release script is the glue.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Monitoring PostgreSQL on NixOS with pg_exporter</title>
        <published>2026-03-20T02:00:00+00:00</published>
        <updated>2026-03-20T02:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/pg-exporter-nixos/"/>
        <id>https://perlpimp.net/blog/pg-exporter-nixos/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/pg-exporter-nixos/">&lt;p&gt;You set up &lt;code&gt;services.prometheus.exporters.postgres&lt;&#x2F;code&gt; on your NixOS box. It works. You get a handful of metrics — connection counts, database sizes, a few transaction stats. You wire up a basic Grafana dashboard and call it done.&lt;&#x2F;p&gt;
&lt;p&gt;Then something goes sideways. Autovacuum is running hot, WAL generation is spiking, your cache hit ratio tanked overnight, and the built-in exporter doesn’t have an opinion on any of it. You find yourself writing custom SQL queries, bolting them onto the exporter’s &lt;code&gt;--extend.query-path&lt;&#x2F;code&gt;, managing a YAML file by hand, and wondering if someone has already solved this.&lt;&#x2F;p&gt;
&lt;p&gt;Someone has.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;pg-exporter-vs-postgres-exporter&quot;&gt;pg_exporter vs postgres_exporter&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;prometheus-community&#x2F;postgres_exporter&quot;&gt;prometheus-postgres-exporter&lt;&#x2F;a&gt; in nixpkgs is the community standard. It does the job for basic monitoring. But if you want deep PostgreSQL observability — the kind where you catch problems before they page you — it starts to feel thin.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;pgsty&#x2F;pg_exporter&quot;&gt;pg_exporter&lt;&#x2F;a&gt;, from the &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;pigsty.io&quot;&gt;Pigsty&lt;&#x2F;a&gt; project, ships over 50 collector definitions out of the box. WAL generation rates, checkpoint durations, lock breakdowns by mode, replication lag, sequential scan ratios, dead tuple counts, temp file writes — all pre-configured. Each collector is a standalone YAML file that can be individually enabled or disabled. It supports auto-discovery of all databases on the server. And it handles pgbouncer metrics natively if you run a connection pooler.&lt;&#x2F;p&gt;
&lt;p&gt;The tradeoff is that pg_exporter isn’t in nixpkgs. You need to package it yourself and write a NixOS module. Which turns out to be a good exercise in how NixOS module design makes this kind of thing surprisingly pleasant.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-package&quot;&gt;The package&lt;&#x2F;h2&gt;
&lt;p&gt;pg_exporter is a Go binary. Packaging it is straightforward with &lt;code&gt;buildGoModule&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;pkgs.buildGoModule {
  pname = &amp;quot;pg-exporter&amp;quot;;
  version = &amp;quot;1.2.0&amp;quot;;
  src = sources.pg-exporter;  # github:pgsty&amp;#x2F;pg_exporter&amp;#x2F;v1.2.0
  vendorHash = &amp;quot;sha256-j5RwjJPUDh9AUYqqfvnvcDnGrykRh+6ydbaRsTutO+U=&amp;quot;;
  doCheck = false;

  postInstall = &amp;#x27;&amp;#x27;
    mkdir -p $out&amp;#x2F;share&amp;#x2F;pg_exporter
    cp config&amp;#x2F;*.yml $out&amp;#x2F;share&amp;#x2F;pg_exporter&amp;#x2F;
  &amp;#x27;&amp;#x27;;

  meta = {
    description = &amp;quot;Advanced PostgreSQL &amp;amp; Pgbouncer metrics exporter for Prometheus&amp;quot;;
    homepage = &amp;quot;https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;pgsty&amp;#x2F;pg_exporter&amp;quot;;
    license = pkgs.lib.licenses.asl20;
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;postInstall&lt;&#x2F;code&gt; is the important bit. pg_exporter’s repo has a &lt;code&gt;config&#x2F;&lt;&#x2F;code&gt; directory with all 50+ collector YAML files. Each defines queries, metric types, and metadata for a specific area of PostgreSQL internals. The binary reads a config directory at startup and loads every YAML it finds. By copying those into &lt;code&gt;$out&#x2F;share&#x2F;pg_exporter&#x2F;&lt;&#x2F;code&gt;, the module can reference them at build time — no runtime fetching, no mutable state.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;getting-it-into-your-flake&quot;&gt;Getting it into your flake&lt;&#x2F;h2&gt;
&lt;p&gt;You have two paths depending on how you like to consume third-party NixOS packages.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;as-a-nur-package&quot;&gt;As a NUR package&lt;&#x2F;h3&gt;
&lt;p&gt;The package and module live in my &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;nix-community&#x2F;NUR&quot;&gt;NUR&lt;&#x2F;a&gt; repository. Add NUR as a flake input and import the module from &lt;code&gt;nur.repos.ijohanne&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs = {
    nixpkgs.url = &amp;quot;github:NixOS&amp;#x2F;nixpkgs&amp;#x2F;nixos-unstable&amp;quot;;
    nur.url = &amp;quot;github:nix-community&amp;#x2F;NUR&amp;quot;;
  };

  outputs = { nixpkgs, nur, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      modules = [
        nur.repos.ijohanne.modules.pg-exporter
        {
          services.pg-exporter.enable = true;
        }
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;NUR handles overlay registration. The NixOS module and the package are both provided by the repo — you just import the module and use it.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;as-a-direct-flake-input&quot;&gt;As a direct flake input&lt;&#x2F;h3&gt;
&lt;p&gt;If you prefer to skip NUR entirely, add the repo as a direct flake input:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  inputs = {
    nixpkgs.url = &amp;quot;github:NixOS&amp;#x2F;nixpkgs&amp;#x2F;nixos-unstable&amp;quot;;
    ijohanne-nur.url = &amp;quot;github:ijohanne&amp;#x2F;nur-packages&amp;quot;;
  };

  outputs = { nixpkgs, ijohanne-nur, ... }: {
    nixosConfigurations.myhost = nixpkgs.lib.nixosSystem {
      modules = [
        ijohanne-nur.nixosModules.pg-exporter
        {
          services.pg-exporter.enable = true;
        }
      ];
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Note the lack of &lt;code&gt;inputs.nixpkgs.follows&lt;&#x2F;code&gt; — the NUR repo pins its own nixpkgs-unstable because pg_exporter requires Go 1.26, which isn’t available on the stable channel yet. This means it pulls in a separate nixpkgs eval, but that only affects the pg_exporter build closure, not the rest of your system.&lt;&#x2F;p&gt;
&lt;p&gt;Either way, once the module is imported the configuration is identical. The rest of this post assumes the module is available — how it got there is up to you.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-nixos-module&quot;&gt;The NixOS module&lt;&#x2F;h2&gt;
&lt;p&gt;Here’s where it gets interesting. The module needs to do three things: manage collector configuration, run the service, and optionally wire up Grafana and Prometheus.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;options&quot;&gt;Options&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;options.services.pg-exporter = {
  enable = mkEnableOption &amp;quot;pg_exporter PostgreSQL metrics exporter&amp;quot;;
  port = mkOption { type = types.port; default = 9630; };
  listenAddress = mkOption { type = types.str; default = &amp;quot;0.0.0.0&amp;quot;; };
  environmentFile = mkOption { type = types.nullOr types.path; default = null; };
  defaultCollectors = mkOption { type = types.bool; default = true; };
  disabledCollectors = mkOption { type = types.listOf types.str; default = [ ]; };
  settings = mkOption { type = settingsFormat.type; default = { }; };
  autoDiscovery = mkOption { type = types.bool; default = false; };
  extraFlags = mkOption { type = types.listOf types.str; default = [ ]; };
  user = mkOption { type = types.nullOr types.str; default = null; };
  enableLocalScraping = mkEnableOption &amp;quot;scraping by local prometheus&amp;quot;;
  grafanaDashboard = mkEnableOption &amp;quot;Grafana dashboard provisioning&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;defaultCollectors&lt;&#x2F;code&gt; controls whether the bundled YAML files ship. &lt;code&gt;disabledCollectors&lt;&#x2F;code&gt; lets you turn off specific ones by name. &lt;code&gt;settings&lt;&#x2F;code&gt; is a freeform Nix attrset that renders to YAML — for custom queries. &lt;code&gt;user&lt;&#x2F;code&gt; lets you run as &lt;code&gt;postgres&lt;&#x2F;code&gt; for socket auth instead of the default &lt;code&gt;DynamicUser&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;config-directory-assembly&quot;&gt;Config directory assembly&lt;&#x2F;h3&gt;
&lt;p&gt;This is the core mechanism. pg_exporter reads all YAML files in its config directory alphabetically and merges them. The module builds this directory at Nix evaluation time:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;settingsFormat = pkgs.formats.yaml { };

disabledFile = pkgs.writeText &amp;quot;pg-exporter-disabled.yml&amp;quot; (builtins.toJSON
  (builtins.listToAttrs (map (name: {
    inherit name;
    value = { skip = true; };
  }) cfg.disabledCollectors)));

settingsFile = settingsFormat.generate &amp;quot;pg-exporter-custom.yml&amp;quot; cfg.settings;

configDir = pkgs.runCommand &amp;quot;pg-exporter-config&amp;quot; { } (&amp;#x27;&amp;#x27;
  mkdir -p $out
&amp;#x27;&amp;#x27; + optionalString cfg.defaultCollectors &amp;#x27;&amp;#x27;
  cp ${package}&amp;#x2F;share&amp;#x2F;pg_exporter&amp;#x2F;*.yml $out&amp;#x2F;
&amp;#x27;&amp;#x27; + optionalString (cfg.disabledCollectors != [ ]) &amp;#x27;&amp;#x27;
  cp ${disabledFile} $out&amp;#x2F;9999-disabled.yml
&amp;#x27;&amp;#x27; + optionalString (cfg.settings != { }) &amp;#x27;&amp;#x27;
  cp ${settingsFile} $out&amp;#x2F;9999-custom.yml
&amp;#x27;&amp;#x27;);
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Three layers:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Default collectors&lt;&#x2F;strong&gt; — copies the 50+ bundled YAMLs from the package&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Disabled collectors&lt;&#x2F;strong&gt; — generates a &lt;code&gt;9999-disabled.yml&lt;&#x2F;code&gt; with &lt;code&gt;skip: true&lt;&#x2F;code&gt; entries&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Custom settings&lt;&#x2F;strong&gt; — renders your Nix attrset to &lt;code&gt;9999-custom.yml&lt;&#x2F;code&gt; via &lt;code&gt;pkgs.formats.yaml&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The &lt;code&gt;9999-&lt;&#x2F;code&gt; prefix ensures overrides load last. pg_exporter processes files alphabetically, so a later file wins. This is the same pattern you’d use with systemd drop-in directories — convention over configuration.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-systemd-service&quot;&gt;The systemd service&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;systemd.services.pg-exporter = {
  description = &amp;quot;pg_exporter PostgreSQL metrics exporter&amp;quot;;
  wantedBy = [ &amp;quot;multi-user.target&amp;quot; ];
  after = [ &amp;quot;network.target&amp;quot; &amp;quot;postgresql.service&amp;quot; ];
  serviceConfig = {
    Restart = &amp;quot;always&amp;quot;;
    ProtectHome = true;
    PrivateTmp = true;
    NoNewPrivileges = true;
  } &amp;#x2F;&amp;#x2F; (if cfg.user != null then {
    User = cfg.user;
  } else {
    DynamicUser = true;
    ProtectSystem = &amp;quot;strict&amp;quot;;
  }) &amp;#x2F;&amp;#x2F; {
    ExecStart = concatStringsSep &amp;quot; &amp;quot; ([
      &amp;quot;\${getBin package}&amp;#x2F;bin&amp;#x2F;pg_exporter&amp;quot;
      &amp;quot;--config=\${configDir}&amp;quot;
      &amp;quot;--web.listen-address=\${cfg.listenAddress}:\${toString cfg.port}&amp;quot;
    ]
    ++ optional cfg.autoDiscovery &amp;quot;--auto-discovery&amp;quot;
    ++ cfg.extraFlags);
  } &amp;#x2F;&amp;#x2F; optionalAttrs (cfg.environmentFile != null) {
    EnvironmentFile = cfg.environmentFile;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;When &lt;code&gt;user&lt;&#x2F;code&gt; is null, the service uses &lt;code&gt;DynamicUser&lt;&#x2F;code&gt; with strict system protection — no writable paths, no privilege escalation. If you set &lt;code&gt;user = &quot;postgres&quot;&lt;&#x2F;code&gt;, it drops &lt;code&gt;DynamicUser&lt;&#x2F;code&gt; and &lt;code&gt;ProtectSystem&lt;&#x2F;code&gt; so it can use the postgres Unix socket for authentication. This avoids needing a password in the environment file entirely.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;one-toggle-grafana-and-prometheus&quot;&gt;One-toggle Grafana and Prometheus&lt;&#x2F;h3&gt;
&lt;p&gt;The last piece is integration wiring:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;# Grafana dashboard provisioning
services.grafana.provision.dashboards.settings.providers =
  mkIf cfg.grafanaDashboard [{
    name = &amp;quot;pg-exporter&amp;quot;;
    options.path = pkgs.runCommand &amp;quot;pg-exporter-dashboards&amp;quot; { } &amp;#x27;&amp;#x27;
      mkdir -p $out
      cp ${.&amp;#x2F;dashboard.json} $out&amp;#x2F;postgres.json
    &amp;#x27;&amp;#x27;;
  }];

# Prometheus scrape config
services.prometheus.scrapeConfigs = mkIf cfg.enableLocalScraping [{
  job_name = &amp;quot;postgres&amp;quot;;
  honor_labels = true;
  static_configs = [{
    targets = [ &amp;quot;127.0.0.1:\${toString cfg.port}&amp;quot; ];
  }];
}];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;grafanaDashboard = true&lt;&#x2F;code&gt; provisions a dashboard JSON into Grafana’s filesystem provisioning. &lt;code&gt;enableLocalScraping = true&lt;&#x2F;code&gt; adds a Prometheus scrape target pointing at localhost. Both are &lt;code&gt;mkIf&lt;&#x2F;code&gt;-guarded — if you don’t flip the toggle, they produce no configuration.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;nix-to-yaml-custom-collectors&quot;&gt;Nix-to-YAML custom collectors&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;code&gt;settings&lt;&#x2F;code&gt; option is where &lt;code&gt;pkgs.formats.yaml&lt;&#x2F;code&gt; shines. You define custom queries in Nix and they render to valid YAML without you ever touching a YAML file:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.pg-exporter.settings = {
  my_custom_query = {
    query = &amp;quot;SELECT count(*) as total_users FROM users&amp;quot;;
    metrics = [
      {
        total_users = {
          usage = &amp;quot;GAUGE&amp;quot;;
          description = &amp;quot;Total number of users&amp;quot;;
        };
      }
    ];
    ttl = 60;
    tags = [ &amp;quot;custom&amp;quot; ];
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Full Nix type checking. No YAML quoting issues. No hand-managing a sidecar file. The Nix module system validates the structure at evaluation time — if you fat-finger a field name, you get an error before the config ever reaches the exporter.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-dashboard&quot;&gt;The dashboard&lt;&#x2F;h2&gt;
&lt;p&gt;The bundled Grafana dashboard covers the metrics that actually matter for day-to-day PostgreSQL operations.&lt;&#x2F;p&gt;
&lt;p&gt;Transactions per second — commits and rollbacks, total across all databases. Zero rollbacks is what you want to see:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;perlpimp.net&#x2F;blog&#x2F;pg-exporter-nixos&#x2F;transactions-per-sec.png&quot; alt=&quot;Transactions per second — commits and rollbacks across all databases&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Tuple operations give you the read&#x2F;write breakdown. The “returned” line is what PostgreSQL scanned; “fetched” is what it actually sent back to the client. A large gap between the two means sequential scans are doing a lot of wasted work:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;perlpimp.net&#x2F;blog&#x2F;pg-exporter-nixos&#x2F;tuple-operations.png&quot; alt=&quot;Tuple operations per second — fetched, returned, and inserted&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Sequential scans vs index scans. You want the yellow line (index scans) to dominate. When sequential scans spike, either a query plan changed or you’re missing an index:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;perlpimp.net&#x2F;blog&#x2F;pg-exporter-nixos&#x2F;seq-vs-index-scans.png&quot; alt=&quot;Sequential scans vs index scans per second&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Connections by state shows your connection pool saturation at a glance. The idle connections (the filled area) sitting around 50–60 is typical for a pooled setup:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;perlpimp.net&#x2F;blog&#x2F;pg-exporter-nixos&#x2F;connections-by-state.png&quot; alt=&quot;Connections by state — active, disabled, and idle&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Sessions per second tracks session creation rate. Flat and low is healthy — spikes mean something is churning connections:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;perlpimp.net&#x2F;blog&#x2F;pg-exporter-nixos&#x2F;sessions-per-sec.png&quot; alt=&quot;Sessions per second — total, abandoned, fatal, and killed&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;All of these panels — plus cache hit ratios, WAL generation rates, checkpoint durations, lock breakdowns, autovacuum activity, database sizes, and temp file writes — ship with a single toggle.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;putting-it-together&quot;&gt;Putting it together&lt;&#x2F;h2&gt;
&lt;p&gt;A minimal production config:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.pg-exporter = {
  enable = true;
  environmentFile = config.sops.secrets.pg-exporter-env.path;
  grafanaDashboard = true;
  enableLocalScraping = true;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The environment file — managed by sops-nix or agenix or however you handle secrets — contains the connection string:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;PG_EXPORTER_URL=postgresql:&amp;#x2F;&amp;#x2F;pg_exporter:password@localhost:5432&amp;#x2F;postgres?sslmode=disable
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If you want socket auth and fine-grained control over collectors:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.pg-exporter = {
  enable = true;
  autoDiscovery = true;
  user = &amp;quot;postgres&amp;quot;;
  environmentFile = config.sops.secrets.pg-exporter-env.path;

  disabledCollectors = [
    &amp;quot;pgbouncer_list&amp;quot;
    &amp;quot;pgbouncer_database&amp;quot;
    &amp;quot;pgbouncer_stat&amp;quot;
    &amp;quot;pgbouncer_pool&amp;quot;
    &amp;quot;pg_tsdb_hypertable&amp;quot;
    &amp;quot;pg_citus&amp;quot;
    &amp;quot;pg_recv&amp;quot;
    &amp;quot;pg_sub&amp;quot;
    &amp;quot;pg_origin&amp;quot;
    &amp;quot;pg_pubrel&amp;quot;
    &amp;quot;pg_subrel&amp;quot;
    &amp;quot;pg_sync_standby&amp;quot;
    &amp;quot;pg_downstream&amp;quot;
    &amp;quot;pg_heartbeat&amp;quot;
  ];

  grafanaDashboard = true;
  enableLocalScraping = true;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;disabledCollectors&lt;&#x2F;code&gt; list is where you prune what you don’t need. No pgbouncer? Disable those four. No TimescaleDB or Citus? Gone. No replication? Drop the replication collectors. The exporter won’t waste time querying for metrics that don’t apply to your setup.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-nixos-module-pattern&quot;&gt;The NixOS module pattern&lt;&#x2F;h2&gt;
&lt;p&gt;This is the same pattern every time: a Go binary gets &lt;code&gt;buildGoModule&lt;&#x2F;code&gt;, its config files land in &lt;code&gt;$out&#x2F;share&#x2F;&lt;&#x2F;code&gt;, the module assembles a config directory from Nix expressions, and integration options (&lt;code&gt;grafanaDashboard&lt;&#x2F;code&gt;, &lt;code&gt;enableLocalScraping&lt;&#x2F;code&gt;) wire into other NixOS services with &lt;code&gt;mkIf&lt;&#x2F;code&gt;. The systemd service uses &lt;code&gt;DynamicUser&lt;&#x2F;code&gt; for least-privilege. Secrets stay in environment files, never in the Nix store.&lt;&#x2F;p&gt;
&lt;p&gt;If you’re packaging any Prometheus exporter for NixOS, this is the template. The specific queries and YAML files change. The module structure doesn’t.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>PostgreSQL 14→18 on NixOS — A Flake-Native Migration</title>
        <published>2026-03-19T02:00:00+00:00</published>
        <updated>2026-03-19T02:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/postgresql-14-18-nixos-migration/"/>
        <id>https://perlpimp.net/blog/postgresql-14-18-nixos-migration/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/postgresql-14-18-nixos-migration/">&lt;p&gt;You check your NixOS server and PostgreSQL is on 14.22. The server’s been running &lt;code&gt;system.stateVersion = &quot;22.05&quot;&lt;&#x2F;code&gt; since it was provisioned, and NixOS dutifully kept the default PostgreSQL version for that era. NixOS 25.11 ships PostgreSQL 17 as the default for new installs, and PG 18 is in nixpkgs. PG 14 reaches end-of-life in November 2026. Thirteen databases across multiple application instances need to move.&lt;&#x2F;p&gt;
&lt;p&gt;PostgreSQL does not support in-place major version upgrades. The on-disk format changes between major versions, so you either use &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt; — which rewrites catalog metadata while preserving data files — or do a full &lt;code&gt;pg_dumpall&lt;&#x2F;code&gt;&#x2F;restore cycle. &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt; is dramatically faster because it links or copies existing data files rather than re-inserting every row through SQL. That’s what we’ll use.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-flake-scripts-instead-of-ad-hoc-shell-commands&quot;&gt;Why flake scripts instead of ad-hoc shell commands&lt;&#x2F;h2&gt;
&lt;p&gt;The conventional approach involves SSHing into the server and running a series of commands copied from a wiki page. This has problems:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Reproducibility&lt;&#x2F;strong&gt; — ad-hoc commands are easy to mistype, especially the &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt; invocation which requires exact paths to both old and new PostgreSQL binaries.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Binary path management&lt;&#x2F;strong&gt; — on a flake-based NixOS system, there’s no &lt;code&gt;&amp;lt;nixpkgs&amp;gt;&lt;&#x2F;code&gt; channel. You can’t run &lt;code&gt;nix-build &#x27;&amp;lt;nixpkgs&amp;gt;&#x27; -A postgresql_14&lt;&#x2F;code&gt; to get a package path. The packages live in the Nix store, and their paths are determined at evaluation time by the flake lock file.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Atomicity&lt;&#x2F;strong&gt; — by encoding the procedure in the flake, the exact PG 14 and PG 18 store paths are baked into the scripts at build time. The scripts and the NixOS configuration reference the same nixpkgs revision, eliminating version skew.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h2 id=&quot;flake-structure&quot;&gt;Flake structure&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;pinning-the-nixpkgs-revision&quot;&gt;Pinning the nixpkgs revision&lt;&#x2F;h3&gt;
&lt;p&gt;The flake uses a &lt;code&gt;nixpkgs-stable&lt;&#x2F;code&gt; input pinned to &lt;code&gt;nixos-25.11&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;inputs = {
  nixpkgs-stable = {
    url = &amp;quot;github:NixOS&amp;#x2F;nixpkgs&amp;#x2F;nixos-25.11&amp;quot;;
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;In the per-system block:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;let
  pkgs = nixpkgs.legacyPackages.${system};
  pkgs-stable = nixpkgs-stable.legacyPackages.${system};
in
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This matters: the server uses &lt;code&gt;nixpkgs-stable&lt;&#x2F;code&gt; (NixOS 25.11), not &lt;code&gt;nixpkgs&lt;&#x2F;code&gt; (unstable). The PostgreSQL packages baked into the scripts must come from the same nixpkgs revision that the server’s NixOS configuration uses. The PG 18 binary in the upgrade script will be byte-for-byte identical to the one NixOS runs as a service after deployment.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;script-definitions&quot;&gt;Script definitions&lt;&#x2F;h3&gt;
&lt;p&gt;The scripts are defined inside the flake’s &lt;code&gt;eachDefaultSystem&lt;&#x2F;code&gt; block, guarded by &lt;code&gt;optionalAttrs&lt;&#x2F;code&gt; so they only exist on x86_64-linux:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;pgUpgradeScripts = pkgs.lib.optionalAttrs (system == &amp;quot;x86_64-linux&amp;quot;) (
  let
    pg14 = pkgs-stable.postgresql_14;
    pg18 = pkgs-stable.postgresql_18;
    oldDir = &amp;quot;&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;postgresql&amp;#x2F;14&amp;quot;;
    newDir = &amp;quot;&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;postgresql&amp;#x2F;18&amp;quot;;
  in
  {
    # step1, step2, step3 defined here
  }
);
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;When Nix evaluates &lt;code&gt;${pg14}&lt;&#x2F;code&gt;, it becomes something like &lt;code&gt;&#x2F;nix&#x2F;store&#x2F;abc123-postgresql-14.22&lt;&#x2F;code&gt;. When it evaluates &lt;code&gt;${pg18}&lt;&#x2F;code&gt;, it becomes &lt;code&gt;&#x2F;nix&#x2F;store&#x2F;xyz789-postgresql-18.3&lt;&#x2F;code&gt;. The resulting shell scripts contain hardcoded, fully-qualified store paths. No runtime resolution, no PATH dependency, no risk of picking up the wrong binary. Both PostgreSQL versions are pulled into each script’s closure, meaning &lt;code&gt;nix run .#postgresql-upgrade-14-18-step1&lt;&#x2F;code&gt; on the server automatically fetches both PG 14 and PG 18 binaries from the cache before executing.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;exposing-as-packages-and-apps&quot;&gt;Exposing as packages and apps&lt;&#x2F;h3&gt;
&lt;p&gt;The scripts are merged into the flake’s &lt;code&gt;packages&lt;&#x2F;code&gt; output:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;packages = {
  default = myPackage;
} &amp;#x2F;&amp;#x2F; pgUpgradeScripts;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;And automatically mapped to &lt;code&gt;apps&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;apps = builtins.mapAttrs (name: pkg: {
  type = &amp;quot;app&amp;quot;;
  program = &amp;quot;${pkg}&amp;#x2F;bin&amp;#x2F;${name}&amp;quot;;
}) pgUpgradeScripts;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This enables &lt;code&gt;nix run .#postgresql-upgrade-14-18-step1&lt;&#x2F;code&gt; syntax without repeating the app definition for each script.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-three-steps-with-human-checkpoints&quot;&gt;Why three steps with human checkpoints&lt;&#x2F;h2&gt;
&lt;p&gt;The migration is deliberately split into three scripts:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Step 1&lt;&#x2F;strong&gt; runs while PG 14 is live. If anything fails, abort with zero consequences.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Step 2&lt;&#x2F;strong&gt; performs the irreversible &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt;. Separated so the operator reviews step 1’s output before committing.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Step 3&lt;&#x2F;strong&gt; runs after deploying the new NixOS configuration with PG 18. A &lt;code&gt;nixos-rebuild switch&lt;&#x2F;code&gt; must happen between step 2 and step 3.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;You could automate all three into one script. You probably shouldn’t. The gap between “preflight passed” and “actually rewrite my production database catalogs” is exactly where you want a human reading terminal output and deciding whether to proceed.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-1-backup-and-preflight&quot;&gt;Step 1: backup and preflight&lt;&#x2F;h2&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;set -euo pipefail

echo &amp;quot;=== Step 1: Backup and preflight check ===&amp;quot;

[[ $EUID -eq 0 ]] || { echo &amp;quot;Run as root&amp;quot;; exit 1; }

echo &amp;quot;Checking current checksum status...&amp;quot;
sudo -u postgres ${pg14}&amp;#x2F;bin&amp;#x2F;pg_controldata ${oldDir} | grep -i checksum

echo &amp;quot;&amp;quot;
echo &amp;quot;Checking for MD5 passwords...&amp;quot;
sudo -u postgres ${pg14}&amp;#x2F;bin&amp;#x2F;psql -c \
  &amp;quot;SELECT rolname,
          CASE WHEN rolpassword LIKE &amp;#x27;md5%&amp;#x27;
               THEN &amp;#x27;MD5 (migrate to SCRAM!)&amp;#x27;
               ELSE &amp;#x27;OK&amp;#x27;
          END AS auth
   FROM pg_authid WHERE rolpassword IS NOT NULL;&amp;quot;

echo &amp;quot;&amp;quot;
echo &amp;quot;Checking for expression indexes...&amp;quot;
sudo -u postgres ${pg14}&amp;#x2F;bin&amp;#x2F;psql -At -c \
  &amp;quot;SELECT schemaname || &amp;#x27;.&amp;#x27; || indexname || &amp;#x27;: &amp;#x27; || indexdef
   FROM pg_indexes
   WHERE indexdef ~ &amp;#x27;\\(.*\\(&amp;#x27;&amp;quot; 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null || true

echo &amp;quot;&amp;quot;
echo &amp;quot;Checking for FTS indexes...&amp;quot;
sudo -u postgres ${pg14}&amp;#x2F;bin&amp;#x2F;psql -At -c \
  &amp;quot;SELECT schemaname || &amp;#x27;.&amp;#x27; || indexname
   FROM pg_indexes
   WHERE indexdef LIKE &amp;#x27;%tsvector%&amp;#x27;
      OR indexdef LIKE &amp;#x27;%gin%&amp;#x27;
      OR indexdef LIKE &amp;#x27;%gist%&amp;#x27;&amp;quot; 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null || true

echo &amp;quot;&amp;quot;
echo &amp;quot;Taking pg_dumpall backup...&amp;quot;
mkdir -p &amp;#x2F;var&amp;#x2F;backup
sudo -u postgres ${pg14}&amp;#x2F;bin&amp;#x2F;pg_dumpall \
  &amp;gt; &amp;quot;&amp;#x2F;var&amp;#x2F;backup&amp;#x2F;postgresql-14-pre-upgrade-$(date +%Y%m%d).sql&amp;quot;
echo &amp;quot;Backup saved to &amp;#x2F;var&amp;#x2F;backup&amp;#x2F;&amp;quot;

echo &amp;quot;&amp;quot;
echo &amp;quot;Listing databases for reference...&amp;quot;
sudo -u postgres ${pg14}&amp;#x2F;bin&amp;#x2F;psql -l

echo &amp;quot;&amp;quot;
echo &amp;quot;Stopping PostgreSQL...&amp;quot;
systemctl stop postgresql.service

echo &amp;quot;Creating socket directory...&amp;quot;
mkdir -p &amp;#x2F;var&amp;#x2F;run&amp;#x2F;postgresql
chown postgres:postgres &amp;#x2F;var&amp;#x2F;run&amp;#x2F;postgresql

checksum_status=$(sudo -u postgres ${pg14}&amp;#x2F;bin&amp;#x2F;pg_controldata ${oldDir} \
  | grep &amp;quot;Data page checksum&amp;quot; | awk &amp;#x27;{print $NF}&amp;#x27;)
initdb_flags=&amp;quot;&amp;quot;
if [ &amp;quot;$checksum_status&amp;quot; = &amp;quot;0&amp;quot; ] \
   || echo &amp;quot;$checksum_status&amp;quot; | grep -qi &amp;quot;off\|disabled&amp;quot;; then
  echo &amp;quot;Old cluster has checksums DISABLED — passing --no-data-checksums to initdb&amp;quot;
  initdb_flags=&amp;quot;--no-data-checksums&amp;quot;
fi

echo &amp;quot;Initializing new data directory...&amp;quot;
sudo -u postgres ${pg18}&amp;#x2F;bin&amp;#x2F;initdb $initdb_flags -D ${newDir}

echo &amp;quot;Running pg_upgrade --check (dry run)...&amp;quot;
cd &amp;#x2F;var&amp;#x2F;lib&amp;#x2F;postgresql
sudo -u postgres ${pg18}&amp;#x2F;bin&amp;#x2F;pg_upgrade \
  --socketdir=&amp;#x2F;var&amp;#x2F;run&amp;#x2F;postgresql \
  --old-bindir=${pg14}&amp;#x2F;bin \
  --new-bindir=${pg18}&amp;#x2F;bin \
  --old-datadir=${oldDir} \
  --new-datadir=${newDir} \
  --check

echo &amp;quot;&amp;quot;
echo &amp;quot;=== Preflight passed. Proceed to step 2. ===&amp;quot;
echo &amp;quot;NOTE: PostgreSQL is stopped. If aborting, run:&amp;quot;
echo &amp;quot;  rm -rf ${newDir} &amp;amp;&amp;amp; systemctl start postgresql.service&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;what-each-check-does&quot;&gt;What each check does&lt;&#x2F;h3&gt;
&lt;p&gt;&lt;strong&gt;Checksum status&lt;&#x2F;strong&gt; — this is the single most critical check for a 14→18 upgrade. PG 18 changed the &lt;code&gt;initdb&lt;&#x2F;code&gt; default to enable data page checksums. If the old cluster has checksums disabled (&lt;code&gt;Data page checksum version: 0&lt;&#x2F;code&gt;), the new cluster must also be initialized without them. Miss this and &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt; refuses to proceed because the checksum settings don’t match.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;MD5 password audit&lt;&#x2F;strong&gt; — PG 18 deprecates MD5 password hashing in favor of SCRAM-SHA-256. The query against &lt;code&gt;pg_authid&lt;&#x2F;code&gt; identifies roles still on MD5 so you can plan migration before or after the upgrade, rather than discovering deprecation warnings flooding your logs at 2 AM.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Expression indexes&lt;&#x2F;strong&gt; — PG 17 tightened security around &lt;code&gt;search_path&lt;&#x2F;code&gt; in functions used by expression indexes and materialized views. The check identifies expression indexes so you can verify they’ll survive the upgrade.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;FTS&#x2F;GIN&#x2F;GiST indexes&lt;&#x2F;strong&gt; — PG 18 changed the full-text search collation provider. After &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt;, FTS and trigram indexes need reindexing because the collation metadata embedded in the index may no longer match. Identifying them upfront tells you what step 3’s reindex will cover.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;pg_dumpall&lt;&#x2F;strong&gt; — a full logical backup as a safety net. If something goes catastrophically wrong, this SQL dump restores the entire cluster to a fresh PG 14 installation. The old data directory is also preserved until explicitly deleted in cleanup, providing a second recovery path.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Conditional initdb&lt;&#x2F;strong&gt; — creates the PG 18 data directory at &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;postgresql&#x2F;18&lt;&#x2F;code&gt;, passing &lt;code&gt;--no-data-checksums&lt;&#x2F;code&gt; only if the old cluster has them disabled. NixOS convention uses &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;postgresql&#x2F;&amp;lt;major-version&amp;gt;&lt;&#x2F;code&gt; as the data directory.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;pg_upgrade –check&lt;&#x2F;strong&gt; — the &lt;code&gt;--check&lt;&#x2F;code&gt; flag performs a complete dry run. It verifies binary compatibility, checks for data types that changed representation, validates that required shared libraries exist, and confirms the clusters are upgrade-compatible — all without modifying any data. If this passes, the real upgrade will pass.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-2-the-actual-migration&quot;&gt;Step 2: the actual migration&lt;&#x2F;h2&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;set -euo pipefail

echo &amp;quot;=== Step 2: Run pg_upgrade ===&amp;quot;

[[ $EUID -eq 0 ]] || { echo &amp;quot;Run as root&amp;quot;; exit 1; }

echo &amp;quot;Ensuring PostgreSQL is stopped...&amp;quot;
systemctl stop postgresql.service 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null || true

mkdir -p &amp;#x2F;var&amp;#x2F;run&amp;#x2F;postgresql
chown postgres:postgres &amp;#x2F;var&amp;#x2F;run&amp;#x2F;postgresql

echo &amp;quot;Running pg_upgrade...&amp;quot;
cd &amp;#x2F;var&amp;#x2F;lib&amp;#x2F;postgresql
sudo -u postgres ${pg18}&amp;#x2F;bin&amp;#x2F;pg_upgrade \
  --socketdir=&amp;#x2F;var&amp;#x2F;run&amp;#x2F;postgresql \
  --old-bindir=${pg14}&amp;#x2F;bin \
  --new-bindir=${pg18}&amp;#x2F;bin \
  --old-datadir=${oldDir} \
  --new-datadir=${newDir}

echo &amp;quot;&amp;quot;
echo &amp;quot;=== pg_upgrade completed. ===&amp;quot;
echo &amp;quot;&amp;quot;
echo &amp;quot;Next steps:&amp;quot;
echo &amp;quot;  1. Update postgresql config to set package = pkgs.postgresql_18&amp;quot;
echo &amp;quot;  2. Commit, push, deploy&amp;quot;
echo &amp;quot;  3. Run step 3 for post-upgrade verification&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is the short one. It does exactly one thing: run &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt; for real. The operation proceeds in phases internally:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Global objects&lt;&#x2F;strong&gt; — roles, tablespaces, and other cluster-wide objects are dumped from the old cluster and restored into the new one.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Schema dump&#x2F;restore&lt;&#x2F;strong&gt; — each database’s schema is dumped as SQL from PG 14 and replayed into PG 18. This is where catalog format differences are resolved.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Data file copy&lt;&#x2F;strong&gt; — the actual table and index data files are copied from old to new. Since we’re on the same filesystem, this is a straightforward copy. The data files are forward-compatible at the page level.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Transaction metadata&lt;&#x2F;strong&gt; — XID counters, multixact state, and WAL position are carried forward for transactional continuity.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Extension update script&lt;&#x2F;strong&gt; — &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt; generates &lt;code&gt;update_extensions.sql&lt;&#x2F;code&gt; for extensions that need &lt;code&gt;ALTER EXTENSION ... UPDATE&lt;&#x2F;code&gt;. Step 3 applies this.&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The entire operation completes in seconds for moderately-sized databases because &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt; doesn’t re-process row data. It just rewrites the catalog and links the data files.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-deploy&quot;&gt;The deploy&lt;&#x2F;h2&gt;
&lt;p&gt;Between step 2 and step 3, deploy the new NixOS configuration. The PostgreSQL module change is minimal:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;# services&amp;#x2F;postgresql.nix
{ pkgs, ... }:
{
  services.postgresql = {
    enable = true;
    package = pkgs.postgresql_18;
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;nixos-rebuild switch&lt;&#x2F;code&gt; activates this. NixOS builds a new system closure referencing &lt;code&gt;postgresql-18.3&lt;&#x2F;code&gt; instead of &lt;code&gt;postgresql-14.22&lt;&#x2F;code&gt;, generates new systemd unit files pointing to the PG 18 binary, starts &lt;code&gt;postgresql.service&lt;&#x2F;code&gt; against the &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;postgresql&#x2F;18&lt;&#x2F;code&gt; data directory that &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt; prepared, and starts all dependent services via systemd dependency ordering. One atomic transition.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;step-3-post-upgrade-verification&quot;&gt;Step 3: post-upgrade verification&lt;&#x2F;h2&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;set -euo pipefail

echo &amp;quot;=== Step 3: Post-upgrade verification ===&amp;quot;

[[ $EUID -eq 0 ]] || { echo &amp;quot;Run as root&amp;quot;; exit 1; }

echo &amp;quot;Checking PostgreSQL service status...&amp;quot;
systemctl status postgresql.service --no-pager

echo &amp;quot;&amp;quot;
echo &amp;quot;Checking PostgreSQL version...&amp;quot;
sudo -u postgres ${pg18}&amp;#x2F;bin&amp;#x2F;psql -c &amp;quot;SELECT version();&amp;quot;

echo &amp;quot;&amp;quot;
echo &amp;quot;Listing databases...&amp;quot;
sudo -u postgres ${pg18}&amp;#x2F;bin&amp;#x2F;psql -l

if [ -f &amp;#x2F;var&amp;#x2F;lib&amp;#x2F;postgresql&amp;#x2F;update_extensions.sql ]; then
  echo &amp;quot;&amp;quot;
  echo &amp;quot;Applying extension updates...&amp;quot;
  sudo -u postgres ${pg18}&amp;#x2F;bin&amp;#x2F;psql \
    -f &amp;#x2F;var&amp;#x2F;lib&amp;#x2F;postgresql&amp;#x2F;update_extensions.sql
fi

echo &amp;quot;&amp;quot;
echo &amp;quot;Reindexing all databases (required for FTS collation changes)...&amp;quot;
for db in $(sudo -u postgres ${pg18}&amp;#x2F;bin&amp;#x2F;psql -At -c \
  &amp;quot;SELECT datname FROM pg_database WHERE datistemplate = false;&amp;quot;); do
  echo &amp;quot;  Reindexing $db...&amp;quot;
  sudo -u postgres ${pg18}&amp;#x2F;bin&amp;#x2F;reindexdb &amp;quot;$db&amp;quot; \
    || echo &amp;quot;  WARNING: reindex of $db failed&amp;quot;
done

echo &amp;quot;&amp;quot;
echo &amp;quot;Checking for MD5 passwords (deprecated in PG18)...&amp;quot;
sudo -u postgres ${pg18}&amp;#x2F;bin&amp;#x2F;psql -c \
  &amp;quot;SELECT rolname,
          CASE WHEN rolpassword LIKE &amp;#x27;md5%&amp;#x27;
               THEN &amp;#x27;MD5 — MIGRATE TO SCRAM!&amp;#x27;
               ELSE &amp;#x27;SCRAM (ok)&amp;#x27;
          END AS auth
   FROM pg_authid WHERE rolpassword IS NOT NULL;&amp;quot;

echo &amp;quot;&amp;quot;
echo &amp;quot;Running ANALYZE on all databases...&amp;quot;
sudo -u postgres ${pg18}&amp;#x2F;bin&amp;#x2F;vacuumdb --all --analyze-only

echo &amp;quot;&amp;quot;
echo &amp;quot;Checking dependent services...&amp;quot;
for svc in service-a service-b service-c analytics webmail; do
  status=$(systemctl is-active &amp;quot;$svc&amp;quot; 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null || echo &amp;quot;not found&amp;quot;)
  printf &amp;quot;  %-25s %s\n&amp;quot; &amp;quot;$svc&amp;quot; &amp;quot;$status&amp;quot;
done

echo &amp;quot;&amp;quot;
echo &amp;quot;=== Verification complete ===&amp;quot;
echo &amp;quot;&amp;quot;
echo &amp;quot;If everything looks good:&amp;quot;
echo &amp;quot;  1. Remove old data directory: rm -rf ${oldDir}&amp;quot;
echo &amp;quot;  2. Delete pg_upgrade logs&amp;#x2F;scripts in &amp;#x2F;var&amp;#x2F;lib&amp;#x2F;postgresql&amp;#x2F;&amp;quot;
echo &amp;quot;  3. If MD5 passwords were found, migrate them to SCRAM-SHA-256&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;strong&gt;Version confirmation&lt;&#x2F;strong&gt; — &lt;code&gt;SELECT version()&lt;&#x2F;code&gt; should return “PostgreSQL 18.x”, confirming the new binary is serving the migrated data.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Extension updates&lt;&#x2F;strong&gt; — applies &lt;code&gt;update_extensions.sql&lt;&#x2F;code&gt; generated by &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt;. This runs &lt;code&gt;ALTER EXTENSION ... UPDATE&lt;&#x2F;code&gt; for extensions whose catalog entries need updating (e.g., citext).&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Full reindex&lt;&#x2F;strong&gt; — every database is reindexed with &lt;code&gt;reindexdb&lt;&#x2F;code&gt;. Essential after a major upgrade to PG 18 because of the FTS collation provider change — text search indexes built under PG 14’s collation assumptions may produce incorrect results under PG 18. Rebuilds all indexes from source data, guaranteeing correctness.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;ANALYZE&lt;&#x2F;strong&gt; — &lt;code&gt;vacuumdb --all --analyze-only&lt;&#x2F;code&gt; refreshes the query planner’s statistics for every table in every database. &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt; preserves data files but not planner statistics, so without this the planner defaults to sequential scans until autovacuum eventually collects fresh stats. You don’t want to discover this via a production latency spike.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Service health check&lt;&#x2F;strong&gt; — confirms all dependent services reconnected and are running after the PG version change.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-makes-this-nixos-specific&quot;&gt;What makes this NixOS-specific&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;strong&gt;No package manager state&lt;&#x2F;strong&gt; — on Debian you’d &lt;code&gt;apt install postgresql-18&lt;&#x2F;code&gt;, manage &lt;code&gt;pg_lsclusters&lt;&#x2F;code&gt;, deal with the &lt;code&gt;postgresql-common&lt;&#x2F;code&gt; wrapper, and worry about APT restarting services. On NixOS the package is a store path. Installing PG 18 doesn’t affect the running PG 14 until you activate a new system configuration.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;stateVersion controls defaults&lt;&#x2F;strong&gt; — the server was still on PG 14 despite running NixOS 25.11 which ships PG 17 for new installs. This is by design — NixOS won’t silently change your database version. The explicit &lt;code&gt;package = pkgs.postgresql_18;&lt;&#x2F;code&gt; pin overrides the stateVersion default and documents the intentional upgrade.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Atomic system activation&lt;&#x2F;strong&gt; — &lt;code&gt;nixos-rebuild switch&lt;&#x2F;code&gt; doesn’t upgrade PostgreSQL in isolation. It atomically transitions the entire system — PostgreSQL, dependent services, systemd units, environment — in a single operation. If the activation fails, NixOS rolls back to the previous generation with PG 14 still configured.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Flake-pinned binaries&lt;&#x2F;strong&gt; — the upgrade scripts and the NixOS configuration derive from the same &lt;code&gt;flake.lock&lt;&#x2F;code&gt;. The PG 18 binary used for &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt; is byte-for-byte identical to the one NixOS runs as a service. On a traditional system, there’s always a risk that the PG 18 you installed for the upgrade is a slightly different build than what the package manager configures as the service.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Garbage collection awareness&lt;&#x2F;strong&gt; — after the upgrade, &lt;code&gt;nix-collect-garbage&lt;&#x2F;code&gt; will eventually remove the PG 14 store path since nothing references it. The upgrade scripts still reference it (PG 14 is in their closure), so as long as the scripts exist in the flake, the PG 14 binary remains available for rollback. Remove the scripts from the flake and PG 14 becomes eligible for garbage collection. This is a nice property — your safety net exists exactly as long as you keep the scripts around, and it disappears automatically when you clean up.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;cleanup&quot;&gt;Cleanup&lt;&#x2F;h2&gt;
&lt;p&gt;After verification confirms everything is healthy:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Remove &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;postgresql&#x2F;14&lt;&#x2F;code&gt; — the old data directory&lt;&#x2F;li&gt;
&lt;li&gt;Delete &lt;code&gt;delete_old_cluster.sh&lt;&#x2F;code&gt; — &lt;code&gt;pg_upgrade&lt;&#x2F;code&gt;’s generated cleanup script&lt;&#x2F;li&gt;
&lt;li&gt;Delete &lt;code&gt;update_extensions.sql&lt;&#x2F;code&gt; — already applied&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The final state of &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;postgresql&#x2F;&lt;&#x2F;code&gt; should contain only the &lt;code&gt;18&lt;&#x2F;code&gt; directory. Remove the upgrade scripts from your flake when you’re confident you won’t need to roll back. Until then, they serve as both documentation and a safety net — keeping PG 14 in the Nix store costs a few hundred megabytes and buys you the option to investigate if something subtle surfaces later.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Building Portable Rust CLI Binaries with Nix — Static Linux, Portable macOS</title>
        <published>2026-03-18T14:00:00+00:00</published>
        <updated>2026-03-18T14:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/portable-rust-binaries-nix/"/>
        <id>https://perlpimp.net/blog/portable-rust-binaries-nix/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/portable-rust-binaries-nix/">&lt;p&gt;You build a Rust binary with Nix. It works on your machine. Ship it to a colleague’s Linux box and it segfaults or complains about missing &lt;code&gt;glibc&lt;&#x2F;code&gt;. Ship the macOS binary to someone without Nix and it fails with &lt;code&gt;dyld: Library not loaded: &#x2F;nix&#x2F;store&#x2F;...&#x2F;libiconv.2.dylib&lt;&#x2F;code&gt;. The binary has Nix store paths burned into its dynamic linker references.&lt;&#x2F;p&gt;
&lt;p&gt;What you actually want:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Linux&lt;&#x2F;strong&gt;: a truly static binary. Zero runtime deps. Runs on any distro, any container, bare metal.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;macOS&lt;&#x2F;strong&gt;: a portable binary that only depends on system libraries (&lt;code&gt;libSystem&lt;&#x2F;code&gt;, &lt;code&gt;libiconv&lt;&#x2F;code&gt;) at their standard paths, not Nix store paths.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;This post covers how to get both from a single flake, plus cross-compilation between platforms.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-baseline-package&quot;&gt;The baseline package&lt;&#x2F;h2&gt;
&lt;p&gt;Start with a standard &lt;code&gt;rustPlatform.buildRustPackage&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;packages.my-cli = pkgs.rustPlatform.buildRustPackage {
  pname = &amp;quot;my-cli&amp;quot;;
  version = &amp;quot;0.1.0&amp;quot;;
  src = .&amp;#x2F;.;
  cargoLock.lockFile = .&amp;#x2F;Cargo.lock;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If your CLI lives in a subdirectory of a monorepo, use &lt;code&gt;buildAndTestSubdir&lt;&#x2F;code&gt; and copy the lockfile up:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;packages.my-cli = pkgs.rustPlatform.buildRustPackage {
  pname = &amp;quot;my-cli&amp;quot;;
  version = &amp;quot;0.1.0&amp;quot;;
  src = .&amp;#x2F;.;
  buildAndTestSubdir = &amp;quot;cli&amp;quot;;
  cargoLock.lockFile = .&amp;#x2F;cli&amp;#x2F;Cargo.lock;

  postUnpack = &amp;#x27;&amp;#x27;
    cp $sourceRoot&amp;#x2F;cli&amp;#x2F;Cargo.lock $sourceRoot&amp;#x2F;Cargo.lock
  &amp;#x27;&amp;#x27;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This builds a working binary per platform via &lt;code&gt;eachDefaultSystem&lt;&#x2F;code&gt;, but it’s dynamically linked and not portable.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;static-builds-platform-conditional-logic&quot;&gt;Static builds: platform-conditional logic&lt;&#x2F;h2&gt;
&lt;p&gt;Linux and macOS need fundamentally different approaches. Use &lt;code&gt;stdenv.hostPlatform.isLinux&lt;&#x2F;code&gt; to branch:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;packages.my-cli-static =
  if pkgs.stdenv.hostPlatform.isLinux then
    # Fully static musl binary
    pkgs.pkgsStatic.rustPlatform.buildRustPackage {
      pname = &amp;quot;my-cli&amp;quot;;
      version = &amp;quot;0.1.0&amp;quot;;
      src = .&amp;#x2F;.;
      cargoLock.lockFile = .&amp;#x2F;Cargo.lock;
      doCheck = false;
    }
  else
    # macOS: static Rust runtime, rewrite dylib paths for portability
    pkgs.rustPlatform.buildRustPackage {
      pname = &amp;quot;my-cli&amp;quot;;
      version = &amp;quot;0.1.0&amp;quot;;
      src = .&amp;#x2F;.;
      cargoLock.lockFile = .&amp;#x2F;Cargo.lock;
      RUSTFLAGS = &amp;quot;-C target-feature=+crt-static&amp;quot;;
      nativeBuildInputs = [ pkgs.darwin.cctools ];
      doCheck = false;

      postInstall = &amp;#x27;&amp;#x27;
        install_name_tool -change \
          ${pkgs.libiconv}&amp;#x2F;lib&amp;#x2F;libiconv.2.dylib \
          &amp;#x2F;usr&amp;#x2F;lib&amp;#x2F;libiconv.2.dylib \
          $out&amp;#x2F;bin&amp;#x2F;my-cli
      &amp;#x27;&amp;#x27;;
    };
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Two completely different strategies behind the same attribute name. Let’s break each one down.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;linux-pkgsstatic-and-musl&quot;&gt;Linux: pkgsStatic and musl&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;code&gt;pkgs.pkgsStatic&lt;&#x2F;code&gt; is a nixpkgs overlay that swaps glibc for musl and forces static linking across all packages. When you use &lt;code&gt;pkgs.pkgsStatic.rustPlatform.buildRustPackage&lt;&#x2F;code&gt;, the Rust toolchain links against musl libc statically. The result is a single binary with zero dynamic dependencies:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;$ file result&amp;#x2F;bin&amp;#x2F;my-cli
result&amp;#x2F;bin&amp;#x2F;my-cli: ELF 64-bit LSB executable, x86-64, statically linked, stripped

$ ldd result&amp;#x2F;bin&amp;#x2F;my-cli
  not a dynamic executable
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This binary runs on any Linux — Alpine, Ubuntu, Debian, RHEL, bare containers, you name it.&lt;&#x2F;p&gt;
&lt;p&gt;Disable tests (&lt;code&gt;doCheck = false&lt;&#x2F;code&gt;) for the static variant — test suites sometimes rely on dynamic features or network access that break under musl static linking. Run your tests in the dynamic build instead.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;macos-why-you-can-t-go-fully-static&quot;&gt;macOS: why you can’t go fully static&lt;&#x2F;h2&gt;
&lt;p&gt;macOS doesn’t support fully static binaries. Apple requires dynamic linking against &lt;code&gt;libSystem.B.dylib&lt;&#x2F;code&gt; (the kernel interface). There’s no musl equivalent for Darwin.&lt;&#x2F;p&gt;
&lt;p&gt;What you &lt;em&gt;can&lt;&#x2F;em&gt; do:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Statically link the Rust&#x2F;C runtime with &lt;code&gt;RUSTFLAGS = &quot;-C target-feature=+crt-static&quot;&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Rewrite any remaining Nix store dylib references to system paths&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;&lt;code&gt;install_name_tool&lt;&#x2F;code&gt; (from &lt;code&gt;pkgs.darwin.cctools&lt;&#x2F;code&gt;) rewrites the Mach-O load commands. After rewriting, &lt;code&gt;otool -L&lt;&#x2F;code&gt; should show only system paths:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;$ otool -L result&amp;#x2F;bin&amp;#x2F;my-cli
result&amp;#x2F;bin&amp;#x2F;my-cli:
  &amp;#x2F;usr&amp;#x2F;lib&amp;#x2F;libiconv.2.dylib (...)
  &amp;#x2F;usr&amp;#x2F;lib&amp;#x2F;libSystem.B.dylib (...)
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;No &lt;code&gt;&#x2F;nix&#x2F;store&#x2F;...&lt;&#x2F;code&gt; references. The binary works on any macOS system.&lt;&#x2F;p&gt;
&lt;p&gt;If your binary links against additional C libraries, you’ll need one &lt;code&gt;install_name_tool -change&lt;&#x2F;code&gt; per library. Check &lt;code&gt;otool -L&lt;&#x2F;code&gt; on the binary before rewriting to see what needs fixing.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;cross-compilation&quot;&gt;Cross-compilation&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;building-linux-from-macos&quot;&gt;Building Linux from macOS&lt;&#x2F;h3&gt;
&lt;p&gt;With &lt;code&gt;eachDefaultSystem&lt;&#x2F;code&gt;, each system gets its own package set. To build a Linux binary from macOS:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix build .#packages.x86_64-linux.my-cli-static
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This works because nixpkgs has cross-compilation toolchains built in. Nix fetches the appropriate cross-compiler and musl target libraries. The result is a static Linux ELF binary built on your Mac.&lt;&#x2F;p&gt;
&lt;p&gt;For aarch64 Linux (ARM servers, Raspberry Pi, Graviton):&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix build .#packages.aarch64-linux.my-cli-static
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;building-macos-from-linux&quot;&gt;Building macOS from Linux&lt;&#x2F;h3&gt;
&lt;p&gt;This also works in the other direction:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix build .#packages.aarch64-darwin.my-cli-static
nix build .#packages.x86_64-darwin.my-cli-static
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Though in practice, macOS cross-compilation from Linux is less common and can hit edge cases with Apple framework headers. If you have a Mac available, prefer native builds for Darwin targets.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;all-four-static-targets&quot;&gt;All four static targets&lt;&#x2F;h3&gt;
&lt;p&gt;With &lt;code&gt;eachDefaultSystem&lt;&#x2F;code&gt; + the platform-conditional static build, one flake produces:&lt;&#x2F;p&gt;
&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Target&lt;&#x2F;th&gt;&lt;th&gt;Linking&lt;&#x2F;th&gt;&lt;th&gt;Technique&lt;&#x2F;th&gt;&lt;&#x2F;tr&gt;&lt;&#x2F;thead&gt;&lt;tbody&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;x86_64-linux&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Fully static (musl)&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;pkgsStatic&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;aarch64-linux&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Fully static (musl)&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;pkgsStatic&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;x86_64-darwin&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Partial static + rewrite&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;crt-static&lt;&#x2F;code&gt; + &lt;code&gt;install_name_tool&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;tr&gt;&lt;td&gt;&lt;code&gt;aarch64-darwin&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;td&gt;Partial static + rewrite&lt;&#x2F;td&gt;&lt;td&gt;&lt;code&gt;crt-static&lt;&#x2F;code&gt; + &lt;code&gt;install_name_tool&lt;&#x2F;code&gt;&lt;&#x2F;td&gt;&lt;&#x2F;tr&gt;
&lt;&#x2F;tbody&gt;&lt;&#x2F;table&gt;
&lt;h2 id=&quot;cargo-toml-optimize-for-release-size&quot;&gt;Cargo.toml: optimize for release size&lt;&#x2F;h2&gt;
&lt;p&gt;For CLI tools, optimize the release profile for binary size:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;toml&quot; class=&quot;language-toml &quot;&gt;&lt;code class=&quot;language-toml&quot; data-lang=&quot;toml&quot;&gt;[profile.release]
lto = true           # Link-time optimization — slower build, smaller binary
strip = true         # Strip debug symbols
codegen-units = 1    # Single codegen unit — better optimization, slower build
opt-level = &amp;quot;z&amp;quot;      # Optimize aggressively for size
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This typically cuts binary size by 50–70% compared to default release settings.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;avoid-openssl-use-rustls&quot;&gt;Avoid OpenSSL — use rustls&lt;&#x2F;h2&gt;
&lt;p&gt;If your CLI makes HTTPS requests, use &lt;code&gt;rustls&lt;&#x2F;code&gt; (pure Rust TLS) instead of &lt;code&gt;openssl-sys&lt;&#x2F;code&gt;. OpenSSL is a C dependency that’s painful to cross-compile and link statically. Most Rust HTTP&#x2F;gRPC crates support a &lt;code&gt;rustls&lt;&#x2F;code&gt; feature flag:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;toml&quot; class=&quot;language-toml &quot;&gt;&lt;code class=&quot;language-toml&quot; data-lang=&quot;toml&quot;&gt;[dependencies]
reqwest = { version = &amp;quot;0.12&amp;quot;, default-features = false, features = [&amp;quot;rustls-tls&amp;quot;] }
# or for gRPC:
tonic = { version = &amp;quot;0.12&amp;quot;, features = [&amp;quot;tls-rustls&amp;quot;] }
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Zero C TLS dependencies = cross-compilation just works.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;protobuf-codegen&quot;&gt;Protobuf codegen&lt;&#x2F;h2&gt;
&lt;p&gt;If your CLI uses Protocol Buffers (e.g., gRPC), provide &lt;code&gt;protoc&lt;&#x2F;code&gt; via Nix:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;nativeBuildInputs = [ pkgs.protobuf ];
PROTOC = &amp;quot;${pkgs.protobuf}&amp;#x2F;bin&amp;#x2F;protoc&amp;quot;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;With a &lt;code&gt;build.rs&lt;&#x2F;code&gt; that runs &lt;code&gt;tonic_build&lt;&#x2F;code&gt; or &lt;code&gt;prost_build&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;rust&quot; class=&quot;language-rust &quot;&gt;&lt;code class=&quot;language-rust&quot; data-lang=&quot;rust&quot;&gt;fn main() -&amp;gt; Result&amp;lt;(), Box&amp;lt;dyn std::error::Error&amp;gt;&amp;gt; {
    tonic_build::configure()
        .build_server(false)
        .compile_protos(&amp;amp;[&amp;quot;proto&amp;#x2F;service.proto&amp;quot;], &amp;amp;[&amp;quot;proto&amp;quot;])?;
    Ok(())
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Protobuf codegen is text generation — it’s platform-independent and needs no special cross-compilation handling. Just make sure &lt;code&gt;PROTOC&lt;&#x2F;code&gt; is set so it finds the compiler.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;making-it-runnable-via-nix-run&quot;&gt;Making it runnable via &lt;code&gt;nix run&lt;&#x2F;code&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;Register the package as an app:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;apps.my-cli = {
  type = &amp;quot;app&amp;quot;;
  program = &amp;quot;${self.packages.${system}.my-cli}&amp;#x2F;bin&amp;#x2F;my-cli&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then: &lt;code&gt;nix run .#my-cli -- --help&lt;&#x2F;code&gt;&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-not-crane-or-naersk&quot;&gt;Why not crane or naersk?&lt;&#x2F;h2&gt;
&lt;p&gt;&lt;code&gt;rustPlatform.buildRustPackage&lt;&#x2F;code&gt; is built into nixpkgs and works natively with &lt;code&gt;pkgsStatic&lt;&#x2F;code&gt;. External tools like crane or naersk add incremental build caching (useful for CI), but for a single CLI binary the added complexity isn’t worth it. &lt;code&gt;cargoLock.lockFile&lt;&#x2F;code&gt; gives you reproducibility, &lt;code&gt;pkgsStatic&lt;&#x2F;code&gt; gives you musl — that’s all you need.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;putting-it-together&quot;&gt;Putting it together&lt;&#x2F;h2&gt;
&lt;p&gt;The full pattern in one flake: a dynamic build for development and testing, a static&#x2F;portable build for distribution, cross-compilation across all four targets, and &lt;code&gt;nix run&lt;&#x2F;code&gt; support. The platform-conditional logic handles the Linux&#x2F;macOS split transparently — consumers of the flake don’t need to think about it. They &lt;code&gt;nix build .#my-cli-static&lt;&#x2F;code&gt; and get a binary that works anywhere their target OS runs.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Deploying Elixir&#x2F;Phoenix on NixOS — What Actually Works</title>
        <published>2026-03-17T14:00:00+00:00</published>
        <updated>2026-03-17T14:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/elixir-phoenix-nixos/"/>
        <id>https://perlpimp.net/blog/elixir-phoenix-nixos/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/elixir-phoenix-nixos/">&lt;p&gt;So this started, as these things usually do, with wanting to run staging and production on the same box. One Phoenix app, two instances, different ports, different databases, different secrets. NixOS should make this declarative and clean. And it does — eventually. But there is a surprising amount of ceremony involved in getting an Elixir release to build under Nix, and the multi-instance NixOS module pattern has enough moving parts that you will get at least three of them wrong on the first try.&lt;&#x2F;p&gt;
&lt;p&gt;This post covers the full path: building the Elixir release in Nix, structuring the NixOS module for multiple instances, wiring up PostgreSQL and secrets without shell script hacks, and the various traps along the way.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;building-the-release-in-nix&quot;&gt;Building the release in Nix&lt;&#x2F;h2&gt;
&lt;p&gt;Elixir releases under Nix use &lt;code&gt;mixRelease&lt;&#x2F;code&gt; from the BEAM package set. Pin your Erlang version explicitly — don’t let it float with nixpkgs:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;beamPackages = pkgs.beam.packages.erlang_27;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Fetch mix dependencies offline with &lt;code&gt;fetchMixDeps&lt;&#x2F;code&gt;. This is Nix’s equivalent of &lt;code&gt;mix deps.get&lt;&#x2F;code&gt;, but reproducible and sandboxed:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;mixFodDeps = beamPackages.fetchMixDeps {
  pname = &amp;quot;my-app-mix-deps&amp;quot;;
  version = &amp;quot;0.1.0&amp;quot;;
  src = .&amp;#x2F;.;
  sha256 = &amp;quot;sha256-AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;sha256&lt;&#x2F;code&gt; is a fixed-output derivation hash. Get the real value by running the build once with a fake hash and letting Nix tell you the correct one.&lt;&#x2F;p&gt;
&lt;p&gt;The Nix sandbox has no network access, which means esbuild and tailwind cannot self-install the way they normally do via mix. You have to hand them the binaries from nixpkgs:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;packages.default = beamPackages.mixRelease {
  pname = &amp;quot;my-app&amp;quot;;
  version = &amp;quot;0.1.0&amp;quot;;
  src = .&amp;#x2F;.;

  inherit mixFodDeps;

  MIX_ESBUILD_PATH = &amp;quot;${pkgs.esbuild}&amp;#x2F;bin&amp;#x2F;esbuild&amp;quot;;
  MIX_TAILWIND_PATH = &amp;quot;${pkgs.tailwindcss}&amp;#x2F;bin&amp;#x2F;tailwindcss&amp;quot;;

  postBuild = &amp;#x27;&amp;#x27;
    mix assets.deploy --no-deps-check
  &amp;#x27;&amp;#x27;;

  fixupPhase = &amp;#x27;&amp;#x27;
    echo &amp;quot;my_app_cookie&amp;quot; &amp;gt; $out&amp;#x2F;releases&amp;#x2F;COOKIE
  &amp;#x27;&amp;#x27;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;--no-deps-check&lt;&#x2F;code&gt; on &lt;code&gt;mix assets.deploy&lt;&#x2F;code&gt; is important — deps are already compiled at that point and the check would fail in the sandbox. The cookie in &lt;code&gt;fixupPhase&lt;&#x2F;code&gt; is for Erlang distribution; set it to something deterministic so nodes can cluster.&lt;&#x2F;p&gt;
&lt;p&gt;One more thing in &lt;code&gt;mix.exs&lt;&#x2F;code&gt; — disable compile-time environment validation in your release config:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;elixir&quot; class=&quot;language-elixir &quot;&gt;&lt;code class=&quot;language-elixir&quot; data-lang=&quot;elixir&quot;&gt;releases: [my_app: [validate_compile_env: false]]
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The build environment and runtime environment are different machines with different env vars. Without this, the release will refuse to start because it detects a mismatch.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;heroicons-skip-mix-wire-it-manually&quot;&gt;Heroicons: skip mix, wire it manually&lt;&#x2F;h2&gt;
&lt;p&gt;If your Phoenix app uses heroicons, you have likely already discovered that it tries to do a git clone during compilation. That does not work in a Nix sandbox. The fix is to fetch heroicons separately and symlink it into &lt;code&gt;deps&#x2F;&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;heroiconsSrc = pkgs.fetchFromGitHub {
  owner = &amp;quot;tailwindlabs&amp;quot;;
  repo = &amp;quot;heroicons&amp;quot;;
  rev = &amp;quot;v2.2.0&amp;quot;;
  hash = &amp;quot;sha256-...&amp;quot;;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;In the package’s &lt;code&gt;preBuild&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;preBuild = &amp;#x27;&amp;#x27;
  mkdir -p deps
  ln -sf ${heroiconsSrc} deps&amp;#x2F;heroicons
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Mirror this in your devShell so local development sees the same thing:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;shellHook = &amp;#x27;&amp;#x27;
  mkdir -p deps
  ln -sfn ${heroiconsSrc} deps&amp;#x2F;heroicons
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Both environments resolve heroicons identically. No hex install, no git clone, no network.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-multi-instance-module&quot;&gt;The multi-instance module&lt;&#x2F;h2&gt;
&lt;p&gt;The core idea is an &lt;code&gt;instances&lt;&#x2F;code&gt; attrset where each key is an instance name and each value is a submodule with its own ports, database, secrets, and system user. One NixOS module produces N systemd services, N PostgreSQL databases, and N nginx vhosts from a single declaration.&lt;&#x2F;p&gt;
&lt;p&gt;Start with the root options:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;options.services.myApp = {
  instances = mkOption {
    type = types.attrsOf (types.submodule instanceModule);
    default = {};
  };
  package = mkOption { type = types.package; };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The instance submodule defines everything an instance needs to run:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;instanceModule = { name, config, ... }: {
  options = {
    enable = mkOption { type = types.bool; default = true; };
    package = mkOption { type = types.package; default = cfg.package; };
    user = mkOption {
      type = types.str;
      default = &amp;quot;my_app_${sanitizeName name}&amp;quot;;
    };
    group = mkOption { type = types.str; default = config.user; };
    listenAddress = mkOption { type = types.str; default = &amp;quot;127.0.0.1&amp;quot;; };
    port = mkOption { type = types.port; default = 4000; };
    metricsPort = mkOption { type = types.nullOr types.port; default = null; };
    host = mkOption { type = types.str; };
    scheme = mkOption {
      type = types.enum [ &amp;quot;http&amp;quot; &amp;quot;https&amp;quot; ];
      default = &amp;quot;https&amp;quot;;
    };
    secretKeyBaseFile = mkOption { type = types.path; };
    database = {
      host = mkOption { type = types.str; default = &amp;quot;&amp;#x2F;run&amp;#x2F;postgresql&amp;quot;; };
      name = mkOption { type = types.str; default = config.user; };
      passwordFile = mkOption {
        type = types.nullOr types.path;
        default = null;
      };
    };
    autoMigrate = mkOption { type = types.bool; default = false; };
    nginxHelper = {
      enable = mkOption { type = types.bool; default = false; };
      domain = mkOption { type = types.str; default = config.host; };
      enableACME = mkOption { type = types.bool; default = false; };
    };
  };
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Note the &lt;code&gt;sanitizeName&lt;&#x2F;code&gt; helper — NixOS system users cannot have dashes:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;sanitizeName = name: builtins.replaceStrings [ &amp;quot;-&amp;quot; ] [ &amp;quot;_&amp;quot; ] name;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;An instance named &lt;code&gt;&quot;production&quot;&lt;&#x2F;code&gt; gets user &lt;code&gt;my_app_production&lt;&#x2F;code&gt;. An instance named &lt;code&gt;&quot;us-east&quot;&lt;&#x2F;code&gt; gets user &lt;code&gt;my_app_us_east&lt;&#x2F;code&gt;. Without this, NixOS will reject the user creation and the error message will not point you anywhere useful.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;filtering-instances&quot;&gt;Filtering instances&lt;&#x2F;h2&gt;
&lt;p&gt;Not every piece of config applies to every instance. An instance using an external database should not trigger local PostgreSQL provisioning. An instance without nginx should not generate a vhost. Filter the instances into categories early:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;enabledInstances = filterAttrs (_: icfg: icfg.enable) cfg.instances;
localPgInstances = filterAttrs (_: icfg:
  icfg.database.passwordFile == null
) enabledInstances;
nginxInstances = filterAttrs (_: icfg:
  icfg.nginxHelper.enable
) enabledInstances;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then gate each config section on the relevant subset. This prevents one instance’s settings from dragging in services that another instance does not need.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;one-user-per-instance-no-exceptions&quot;&gt;One user per instance, no exceptions&lt;&#x2F;h2&gt;
&lt;p&gt;Each instance gets its own system user and group. This is not optional. If staging and production share a user, a misconfigured secret path in staging can expose production credentials. systemd’s &lt;code&gt;LoadCredential&lt;&#x2F;code&gt; binds files to the service’s user context — sharing users breaks that isolation.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;users.users = mapAttrs&amp;#x27; (_: icfg:
  nameValuePair icfg.user {
    isSystemUser = true;
    group = icfg.group;
    home = &amp;quot;&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;my-app&amp;quot;;
  }
) enabledInstances;

users.groups = mapAttrs&amp;#x27; (_: icfg:
  nameValuePair icfg.group {}
) enabledInstances;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;assertions-catch-mistakes-at-eval-time&quot;&gt;Assertions catch mistakes at eval time&lt;&#x2F;h2&gt;
&lt;p&gt;When you have multiple instances, the most common misconfiguration is duplicate values — two instances claiming the same port, database name, or domain. These should fail at &lt;code&gt;nixos-rebuild&lt;&#x2F;code&gt; time, not at 3am when the second service fails to bind:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;assertions = let
  allPorts = allValues (icfg: icfg.port);
  allDomains = mapAttrsToList (_: icfg:
    icfg.nginxHelper.domain
  ) nginxInstances;
  allDbNames = allValues (icfg: icfg.database.name);
in [
  {
    assertion = length allPorts == length (unique allPorts);
    message = &amp;quot;myApp: all port values must be unique across instances.&amp;quot;;
  }
  {
    assertion = length allDomains == length (unique allDomains);
    message = &amp;quot;myApp: all nginx domains must be unique across instances.&amp;quot;;
  }
  {
    assertion = length allDbNames == length (unique allDbNames);
    message = &amp;quot;myApp: all database names must be unique across instances.&amp;quot;;
  }
];
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This turns a runtime mystery into a build-time error with a clear message. Add assertions for every value that must be unique — ports, gRPC ports, metrics ports, domains, database names.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;generating-systemd-services&quot;&gt;Generating systemd services&lt;&#x2F;h2&gt;
&lt;p&gt;One systemd service per instance, generated with &lt;code&gt;mapAttrs&#x27;&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;systemd.services = mapAttrs&amp;#x27; (name: icfg:
  let
    stateDir = &amp;quot;my-app&amp;#x2F;${name}&amp;quot;;
    runtimeDir = &amp;quot;my-app-${name}&amp;quot;;
    pkg = icfg.package;
  in nameValuePair &amp;quot;my-app-${name}&amp;quot; {
    description = &amp;quot;My App (${name})&amp;quot;;
    wantedBy = [ &amp;quot;multi-user.target&amp;quot; ];
    after = [ &amp;quot;network.target&amp;quot; &amp;quot;postgresql.service&amp;quot; ];
    wants = [ &amp;quot;postgresql.service&amp;quot; ];

    environment = {
      PHX_SERVER = &amp;quot;true&amp;quot;;
      PORT = toString icfg.port;
      LISTEN_ADDRESS = icfg.listenAddress;
      PHX_HOST = icfg.host;
      RELEASE_NODE = &amp;quot;my_app_${sanitizeName name}&amp;quot;;
      RELEASE_TMP = &amp;quot;&amp;#x2F;tmp&amp;#x2F;my-app-${name}&amp;quot;;
      DATABASE_HOST = icfg.database.host;
      DATABASE_NAME = icfg.database.name;
      DATABASE_USER = icfg.user;
      LANG = &amp;quot;en_US.UTF-8&amp;quot;;
    };

    serviceConfig = {
      Type = &amp;quot;exec&amp;quot;;
      User = icfg.user;
      Group = icfg.group;
      Restart = &amp;quot;on-failure&amp;quot;;
      RestartSec = 5;
      WorkingDirectory = &amp;quot;&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;${stateDir}&amp;quot;;
      StateDirectory = stateDir;
      RuntimeDirectory = runtimeDir;

      NoNewPrivileges = true;
      ProtectSystem = &amp;quot;strict&amp;quot;;
      ProtectHome = true;
      PrivateTmp = true;
      PrivateDevices = true;
      ProtectKernelTunables = true;
      ProtectKernelModules = true;
      ProtectControlGroups = true;
      RestrictSUIDSGID = true;
      RemoveIPC = true;

      LoadCredential = lib.filter (x: x != null) [
        &amp;quot;secret_key_base:${icfg.secretKeyBaseFile}&amp;quot;
        (if icfg.database.passwordFile != null
         then &amp;quot;db_password:${icfg.database.passwordFile}&amp;quot;
         else null)
      ];
    };

    script = &amp;#x27;&amp;#x27;
      export SECRET_KEY_BASE=&amp;quot;$(&amp;lt; $CREDENTIALS_DIRECTORY&amp;#x2F;secret_key_base)&amp;quot;
      ${lib.optionalString (icfg.database.passwordFile != null) &amp;#x27;&amp;#x27;
        export DATABASE_PASSWORD=&amp;quot;$(&amp;lt; $CREDENTIALS_DIRECTORY&amp;#x2F;db_password)&amp;quot;
      &amp;#x27;&amp;#x27;}
      ${lib.optionalString icfg.autoMigrate &amp;#x27;&amp;#x27;
        ${pkg}&amp;#x2F;bin&amp;#x2F;my_app eval &amp;#x27;MyApp.Release.migrate()&amp;#x27;
      &amp;#x27;&amp;#x27;}
      exec ${pkg}&amp;#x2F;bin&amp;#x2F;my_app start
    &amp;#x27;&amp;#x27;;
  }
) enabledInstances;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A few things worth calling out:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;RELEASE_NODE&lt;&#x2F;code&gt; must be unique per instance.&lt;&#x2F;strong&gt; Erlang uses this to identify nodes for clustering and remote shells. Two instances with the same node name on the same host will collide, and the second one will silently fail to start distributed Erlang — or worse, connect to the first one’s runtime.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;StateDirectory&lt;&#x2F;code&gt; and &lt;code&gt;RuntimeDirectory&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; are namespaced per instance. systemd creates these automatically with the correct ownership. No &lt;code&gt;mkdir -p&lt;&#x2F;code&gt; in shell scripts, no &lt;code&gt;chown&lt;&#x2F;code&gt; hacks.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;strong&gt;Security hardening&lt;&#x2F;strong&gt; is applied uniformly. &lt;code&gt;ProtectSystem = &quot;strict&quot;&lt;&#x2F;code&gt; makes the filesystem read-only except for the state and runtime directories. &lt;code&gt;PrivateTmp&lt;&#x2F;code&gt; gives each service its own &lt;code&gt;&#x2F;tmp&lt;&#x2F;code&gt;. This is free isolation — there is no reason not to use it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;frontload-everything-in-the-start-script&quot;&gt;Frontload everything in the start script&lt;&#x2F;h2&gt;
&lt;p&gt;Elixir releases read config at boot via &lt;code&gt;runtime.exs&lt;&#x2F;code&gt;. You cannot inject environment variables after the BEAM starts — there is no hot-reload of system env in a running release.&lt;&#x2F;p&gt;
&lt;p&gt;Do not use systemd &lt;code&gt;Environment=&lt;&#x2F;code&gt; for secrets. Those values end up in the unit file and are visible in &lt;code&gt;&#x2F;proc&lt;&#x2F;code&gt;. Use &lt;code&gt;LoadCredential&lt;&#x2F;code&gt; instead — systemd places the file contents in &lt;code&gt;$CREDENTIALS_DIRECTORY&lt;&#x2F;code&gt;, and the start script reads them into env vars before exec.&lt;&#x2F;p&gt;
&lt;p&gt;The pattern is: non-secret config goes in &lt;code&gt;environment&lt;&#x2F;code&gt;, secrets go through &lt;code&gt;LoadCredential&lt;&#x2F;code&gt;, and the &lt;code&gt;script&lt;&#x2F;code&gt; block ties them together. Migrations run in the same script block because they need the same env vars. Do not try to run migrations in a separate &lt;code&gt;ExecStartPre&lt;&#x2F;code&gt; — it will not have your database credentials.&lt;&#x2F;p&gt;
&lt;p&gt;All env, all secrets, all pre-start tasks — one script block. One context. Nothing leaking between phases.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;postgresql-without-shell-script-hacks&quot;&gt;PostgreSQL without shell script hacks&lt;&#x2F;h2&gt;
&lt;p&gt;The temptation is to use &lt;code&gt;ExecStartPre&lt;&#x2F;code&gt; to run &lt;code&gt;createuser&lt;&#x2F;code&gt;, &lt;code&gt;createdb&lt;&#x2F;code&gt;, and &lt;code&gt;psql -c &quot;GRANT ...&quot;&lt;&#x2F;code&gt;. This works until it does not. You end up fighting race conditions with the postgresql service, handling “already exists” errors with &lt;code&gt;|| true&lt;&#x2F;code&gt;, and accumulating shell that breaks on the next NixOS upgrade.&lt;&#x2F;p&gt;
&lt;p&gt;The right approach: the system user the service runs under has the same name as the database. NixOS wires this correctly through &lt;code&gt;ensureDatabases&lt;&#x2F;code&gt; and &lt;code&gt;ensureUsers&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.postgresql = mkIf (localPgInstances != {}) {
  ensureDatabases = mapAttrsToList (_: icfg:
    icfg.database.name
  ) localPgInstances;
  ensureUsers = let
    dbUsers = unique (mapAttrsToList (_: icfg:
      icfg.user
    ) localPgInstances);
  in map (u: {
    name = u;
    ensureDBOwnership = true;
  }) dbUsers;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;With &lt;code&gt;ensureDBOwnership = true&lt;&#x2F;code&gt;, the PostgreSQL role automatically owns its database. The app connects over a Unix socket at &lt;code&gt;&#x2F;run&#x2F;postgresql&lt;&#x2F;code&gt; — peer authentication, no password needed for local connections, no &lt;code&gt;pg_hba.conf&lt;&#x2F;code&gt; hacks.&lt;&#x2F;p&gt;
&lt;p&gt;In &lt;code&gt;runtime.exs&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;elixir&quot; class=&quot;language-elixir &quot;&gt;&lt;code class=&quot;language-elixir&quot; data-lang=&quot;elixir&quot;&gt;db_host = System.get_env(&amp;quot;DATABASE_HOST&amp;quot;, &amp;quot;&amp;#x2F;run&amp;#x2F;postgresql&amp;quot;)

host_opts =
  if String.starts_with?(db_host, &amp;quot;&amp;#x2F;&amp;quot;),
    do: [socket_dir: db_host],
    else: [hostname: db_host]
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This handles both the NixOS case (Unix socket) and external databases (TCP hostname) from the same code path. The gating on &lt;code&gt;localPgInstances&lt;&#x2F;code&gt; ensures that instances using a remote database URL do not trigger local PostgreSQL provisioning.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;nginx-with-content-type-routing&quot;&gt;Nginx with content-type routing&lt;&#x2F;h2&gt;
&lt;p&gt;When an instance enables &lt;code&gt;nginxHelper&lt;&#x2F;code&gt;, the module auto-generates an nginx vhost. If your app serves both HTTP and gRPC on different ports, you can route based on content type:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.nginx = mkIf (nginxInstances != {}) {
  enable = true;
  virtualHosts = mapAttrs&amp;#x27; (_: icfg:
    nameValuePair icfg.nginxHelper.domain ({
      forceSSL = icfg.nginxHelper.enableACME;
      enableACME = icfg.nginxHelper.enableACME;
      locations.&amp;quot;~ ^&amp;#x2F;&amp;quot; = {
        proxyPass = &amp;quot;http:&amp;#x2F;&amp;#x2F;${icfg.listenAddress}:${toString icfg.port}&amp;quot;;
        proxyWebsockets = true;
      };
    })
  ) nginxInstances;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;&lt;code&gt;proxyWebsockets = true&lt;&#x2F;code&gt; is essential for Phoenix LiveView. Without it, the WebSocket upgrade fails and LiveView falls back to long-polling — or just breaks, depending on your configuration.&lt;&#x2F;p&gt;
&lt;p&gt;ACME is optional per instance. Staging might use a test CA or no TLS at all. Production uses Let’s Encrypt. The module does not assume.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;putting-it-together&quot;&gt;Putting it together&lt;&#x2F;h2&gt;
&lt;p&gt;Here is what a two-instance deployment looks like from the consumer’s side:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  services.myApp = {
    package = my-app.packages.${pkgs.system}.default;

    instances = {
      production = {
        host = &amp;quot;app.example.com&amp;quot;;
        port = 4000;
        secretKeyBaseFile = &amp;#x2F;run&amp;#x2F;secrets&amp;#x2F;prod-secret-key;
        database.name = &amp;quot;my_app_prod&amp;quot;;
        autoMigrate = true;
        nginxHelper.enable = true;
        nginxHelper.enableACME = true;
      };
      staging = {
        host = &amp;quot;app-staging.example.com&amp;quot;;
        port = 4001;
        secretKeyBaseFile = &amp;#x2F;run&amp;#x2F;secrets&amp;#x2F;staging-secret-key;
        database.name = &amp;quot;my_app_staging&amp;quot;;
        autoMigrate = true;
        nginxHelper.enable = true;
        nginxHelper.enableACME = true;
      };
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;From this declaration, &lt;code&gt;nixos-rebuild&lt;&#x2F;code&gt; produces:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Two systemd services: &lt;code&gt;my-app-production&lt;&#x2F;code&gt;, &lt;code&gt;my-app-staging&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Two system users: &lt;code&gt;my_app_production&lt;&#x2F;code&gt;, &lt;code&gt;my_app_staging&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Two state directories under &lt;code&gt;&#x2F;var&#x2F;lib&#x2F;my-app&#x2F;&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Two PostgreSQL databases with matching owners&lt;&#x2F;li&gt;
&lt;li&gt;Two nginx vhosts with separate ACME certificates&lt;&#x2F;li&gt;
&lt;li&gt;Assertions that prevent port, domain, or database collisions&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;Add a third instance and you get a third of everything. Remove one and NixOS cleans up the service and vhost. The database and user persist — NixOS does not drop databases on removal, which is the correct default.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-would-do-differently&quot;&gt;What I would do differently&lt;&#x2F;h2&gt;
&lt;p&gt;If I were starting over, I would use &lt;code&gt;writeShellApplication&lt;&#x2F;code&gt; instead of bare &lt;code&gt;script&lt;&#x2F;code&gt; blocks for the pre-start logic. &lt;code&gt;writeShellApplication&lt;&#x2F;code&gt; runs shellcheck and adds &lt;code&gt;set -euo pipefail&lt;&#x2F;code&gt; automatically — it catches the kind of bugs that only surface in production when a credential file is missing and the script silently continues with an empty variable.&lt;&#x2F;p&gt;
&lt;p&gt;I would also add health checks from day one. systemd supports &lt;code&gt;Type = &quot;notify&quot;&lt;&#x2F;code&gt; with a watchdog, and there are Elixir libraries that integrate with it. Without health checks, a service that starts but stops accepting connections will sit there in “active (running)” state until someone notices manually.&lt;&#x2F;p&gt;
&lt;p&gt;But those are refinements. The patterns above — typed instance submodules, &lt;code&gt;mapAttrs&#x27;&lt;&#x2F;code&gt; for service generation, &lt;code&gt;LoadCredential&lt;&#x2F;code&gt; for secrets, &lt;code&gt;ensureDatabases&lt;&#x2F;code&gt; for PostgreSQL, assertions for uniqueness — these are the load-bearing walls. Get them right and everything else is furniture.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Solving the NixOS SOPS Bootstrap Problem</title>
        <published>2026-03-16T20:00:00+00:00</published>
        <updated>2026-03-16T20:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/sops-bootstrap-problem/"/>
        <id>https://perlpimp.net/blog/sops-bootstrap-problem/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/sops-bootstrap-problem/">&lt;p&gt;You’ve just finished a &lt;a href=&quot;&#x2F;blog&#x2F;nixos-anywhere-ovh-kimsufi&#x2F;&quot;&gt;nixos-anywhere&lt;&#x2F;a&gt; deploy. The server rebooted into NixOS. You SSH in, feeling good about yourself, and run &lt;code&gt;nixos-rebuild switch --flake .#myhost&lt;&#x2F;code&gt;. Nix starts evaluating your flake, hits a private GitHub input, and dies:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;error: unable to download &amp;#x27;https:&amp;#x2F;&amp;#x2F;api.github.com&amp;#x2F;repos&amp;#x2F;yourorg&amp;#x2F;private-flake&amp;#x2F;tarball&amp;#x2F;...&amp;#x27;: HTTP error 404
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Your flake has private inputs. The access tokens that authenticate those fetches are managed by sops-nix. sops-nix decrypts secrets during NixOS activation. But you haven’t activated the new configuration yet — that’s what you’re trying to do. The host can’t build its own config because building requires tokens that only exist after a successful build.&lt;&#x2F;p&gt;
&lt;p&gt;Welcome to the bootstrap problem.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;how-sops-nix-secrets-work-the-short-version&quot;&gt;How sops-nix secrets work (the short version)&lt;&#x2F;h2&gt;
&lt;p&gt;If you’re already familiar with sops-nix, skip ahead. For everyone else, here’s the minimum context.&lt;&#x2F;p&gt;
&lt;p&gt;Secrets live in encrypted YAML files in your repo — &lt;code&gt;secrets&#x2F;myhost.yaml&lt;&#x2F;code&gt;, for instance. A &lt;code&gt;.sops.yaml&lt;&#x2F;code&gt; file at the repo root maps path patterns to the age public keys that can decrypt them. Each NixOS host has an age key derived from its SSH host key:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;ssh-keyscan myhost.example.com 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null | ssh-to-age
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;During NixOS activation, sops-nix takes those encrypted YAML files, decrypts them using the host’s age private key, and drops the plaintext into &lt;code&gt;&#x2F;run&#x2F;secrets&#x2F;&lt;&#x2F;code&gt;. One of those secrets is typically &lt;code&gt;nix_builder_access_tokens&lt;&#x2F;code&gt; — a file containing GitHub PATs that Nix reads via &lt;code&gt;nix.extraOptions&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;nix.extraOptions = &amp;#x27;&amp;#x27;
  !include &amp;#x2F;run&amp;#x2F;secrets&amp;#x2F;nix_builder_access_tokens
&amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is the mechanism. Encrypted at rest, decrypted on activation, consumed by Nix for authenticated fetches. It works beautifully — once the system is running. The problem is the first time.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-chicken-and-the-egg&quot;&gt;The chicken and the egg&lt;&#x2F;h2&gt;
&lt;p&gt;After a fresh nixos-anywhere install, the host boots into a minimal NixOS. It has an SSH host key (nixos-anywhere generated one), so it &lt;em&gt;could&lt;&#x2F;em&gt; decrypt sops secrets — if those secrets had been deployed. But they haven’t been, because the first real &lt;code&gt;nixos-rebuild switch&lt;&#x2F;code&gt; is what deploys them.&lt;&#x2F;p&gt;
&lt;p&gt;And that first &lt;code&gt;nixos-rebuild switch&lt;&#x2F;code&gt; can’t complete, because it needs to fetch private flake inputs, which requires the access tokens, which are in the sops secrets, which haven’t been deployed yet.&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;Fresh install
  → needs nixos-rebuild switch to deploy sops secrets
    → nixos-rebuild needs private flake inputs
      → private flake inputs need access tokens
        → access tokens are in sops secrets
          → sops secrets haven&amp;#x27;t been deployed yet
            → goto: needs nixos-rebuild switch
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If you’ve ever written a Nix expression that infinitely recurses, this should feel familiar. Same energy, different layer of the stack.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-fix-build-somewhere-else&quot;&gt;The fix: build somewhere else&lt;&#x2F;h2&gt;
&lt;p&gt;The insight is simple: the &lt;em&gt;target&lt;&#x2F;em&gt; host can’t build its own config, but some other host that already has decrypted tokens can. You just need to separate where the build happens from where the result gets activated.&lt;&#x2F;p&gt;
&lt;p&gt;&lt;code&gt;nixos-rebuild&lt;&#x2F;code&gt; has two flags for exactly this:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;--target-host&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — the machine where the built system closure gets copied to and activated&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;--build-host&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — the machine where the Nix build actually runs&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run nixpkgs#nixos-rebuild -- switch \
  --flake .#myhost \
  --target-host root@myhost.example.com \
  --build-host root@builder.example.com
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This tells nixos-rebuild: SSH into &lt;code&gt;builder.example.com&lt;&#x2F;code&gt;, run the build there (where access tokens are already decrypted and available), copy the resulting closure to &lt;code&gt;myhost.example.com&lt;&#x2F;code&gt;, and activate it. The target host never needs to fetch a single flake input. It just receives a pre-built system closure and switches to it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-nix-run-nixpkgs-nixos-rebuild&quot;&gt;Why &lt;code&gt;nix run nixpkgs#nixos-rebuild&lt;&#x2F;code&gt;&lt;&#x2F;h2&gt;
&lt;p&gt;If you’re managing NixOS servers from a Mac (as I am), &lt;code&gt;nixos-rebuild&lt;&#x2F;code&gt; isn’t in your PATH — it’s a NixOS tool, not a Nix tool. &lt;code&gt;nix run nixpkgs#nixos-rebuild --&lt;&#x2F;code&gt; runs it from nixpkgs without installing it. The &lt;code&gt;--&lt;&#x2F;code&gt; separates &lt;code&gt;nix run&lt;&#x2F;code&gt; flags from &lt;code&gt;nixos-rebuild&lt;&#x2F;code&gt; flags. This is one of those things that’s obvious in hindsight and confusing for about fifteen minutes when you first encounter it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-real-workflow-dedicated-server-after-nixos-anywhere&quot;&gt;A real workflow: dedicated server after nixos-anywhere&lt;&#x2F;h2&gt;
&lt;p&gt;Here’s the concrete scenario. You’ve just deployed NixOS onto a Kimsufi dedicated server using nixos-anywhere. The server rebooted, you can SSH in, NixOS is running. But this is the bare-bones config — no sops secrets deployed yet.&lt;&#x2F;p&gt;
&lt;p&gt;First, grab the new host’s age key (nixos-anywhere generates a new SSH host key on every run):&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;ssh-keyscan anubis.example.com 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null | ssh-to-age
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Update &lt;code&gt;.sops.yaml&lt;&#x2F;code&gt; with the new key, then re-encrypt:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;sops updatekeys secrets&amp;#x2F;anubis.yaml
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Now the bootstrap deploy, building on an existing host that already has tokens:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run nixpkgs#nixos-rebuild -- switch \
  --flake .#anubis \
  --target-host root@anubis.example.com \
  --build-host root@builder.example.com
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The builder compiles the full system closure — fetching private inputs with its own decrypted tokens — and ships it to the new host. NixOS activates, sops-nix runs, and suddenly &lt;code&gt;anubis&lt;&#x2F;code&gt; has its own decrypted access tokens sitting in &lt;code&gt;&#x2F;run&#x2F;secrets&#x2F;&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;From this point forward, the host can build its own configuration. Subsequent deploys are straightforward:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;ssh anubis nixos-rebuild switch --flake .#anubis
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The bootstrap problem is a one-time obstacle. You climb over it once with &lt;code&gt;--build-host&lt;&#x2F;code&gt;, and then it’s gone.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;when-the-build-host-is-your-local-machine&quot;&gt;When the build host is your local machine&lt;&#x2F;h2&gt;
&lt;p&gt;You don’t always have a separate build server. If your local machine has the tokens (because you’re working from a checkout of the flake and your Nix config includes the access tokens), you can omit &lt;code&gt;--build-host&lt;&#x2F;code&gt; entirely:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run nixpkgs#nixos-rebuild -- switch \
  --flake .#myhost \
  --target-host root@myhost.example.com
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This builds locally and copies the closure to the target. If you’re on a Mac, this means cross-compiling for Linux (or using a remote builder), which is its own adventure. But if you’re on a Linux workstation with the right tokens configured, this is the simplest path.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;beyond-bootstrap&quot;&gt;Beyond bootstrap&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;code&gt;--build-host&lt;&#x2F;code&gt; &#x2F; &lt;code&gt;--target-host&lt;&#x2F;code&gt; pattern isn’t only useful for the sops bootstrap. It’s the right tool any time the target can’t or shouldn’t build its own config:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Underpowered hardware&lt;&#x2F;strong&gt; — Raspberry Pis, small VPSes, anything where a full Nix build would take an hour or swap itself to death.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Cross-compilation avoidance&lt;&#x2F;strong&gt; — building an aarch64 config on an x86_64 host (or vice versa) via a same-architecture build host is often faster than cross-compiling locally.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Minimal attack surface&lt;&#x2F;strong&gt; — some production hosts intentionally don’t have Nix build capabilities. They receive pre-built closures and activate them. No compiler, no fetcher, no GitHub tokens on the box at all.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;The bootstrap problem is the most annoying instance of needing this pattern, but it’s far from the only one.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-full-picture&quot;&gt;The full picture&lt;&#x2F;h2&gt;
&lt;p&gt;Putting it all together, here’s the lifecycle of a new NixOS host from bare metal to self-sufficient:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;nixos-anywhere&lt;&#x2F;strong&gt; installs NixOS onto the target (minimal config, no secrets)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;ssh-keyscan | ssh-to-age&lt;&#x2F;strong&gt; gets the new host’s age public key&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;sops updatekeys&lt;&#x2F;strong&gt; re-encrypts secrets for the new key&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;nixos-rebuild –build-host&lt;&#x2F;strong&gt; does the first deploy using an existing host’s tokens&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;sops-nix activates&lt;&#x2F;strong&gt; and decrypts access tokens on the target&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;The host can now self-build&lt;&#x2F;strong&gt; — subsequent deploys need no external builder&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;Steps 2-4 are the bootstrap dance. You do it once per host, and then you never think about it again — until the next fresh install, when you’ll wish you had bookmarked this post.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Deploying NixOS on OVH Kimsufi with nixos-anywhere</title>
        <published>2026-03-14T20:00:00+00:00</published>
        <updated>2026-03-14T20:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/nixos-anywhere-ovh-kimsufi/"/>
        <id>https://perlpimp.net/blog/nixos-anywhere-ovh-kimsufi/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/nixos-anywhere-ovh-kimsufi/">&lt;p&gt;So you’ve found yourself a cheap dedicated server on OVH Kimsufi (now branded “OVH Eco”, because marketing), and you want to run NixOS on it. OVH doesn’t offer NixOS as an install option. What they do offer is a rescue mode that gives you root SSH access to a Debian live environment. That’s all &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;nix-community&#x2F;nixos-anywhere&quot;&gt;nixos-anywhere&lt;&#x2F;a&gt; needs.&lt;&#x2F;p&gt;
&lt;p&gt;This is a walkthrough of how I deployed NixOS onto a Kimsufi server with two 4TB HGST disks, using disko for declarative partitioning and mdadm RAID-0, and the various traps I walked into along the way.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-disk-layout&quot;&gt;The disk layout&lt;&#x2F;h2&gt;
&lt;p&gt;Kimsufi servers in the lower tiers come with two spinning rust disks. I wanted both of them in a RAID-0 for maximum usable space.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s the full disko config:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  disko.devices = {
    disk = {
      disk0 = {
        type = &amp;quot;disk&amp;quot;;
        device = &amp;quot;&amp;#x2F;dev&amp;#x2F;disk&amp;#x2F;by-id&amp;#x2F;ata-HGST_HUS726T4TALA6L1_V6GXVWPS&amp;quot;;
        content = {
          type = &amp;quot;gpt&amp;quot;;
          partitions = {
            esp = {
              size = &amp;quot;512M&amp;quot;;
              type = &amp;quot;EF00&amp;quot;;
              content = {
                type = &amp;quot;filesystem&amp;quot;;
                format = &amp;quot;vfat&amp;quot;;
                mountpoint = &amp;quot;&amp;#x2F;boot&amp;#x2F;ESP0&amp;quot;;
                mountOptions = [ &amp;quot;umask=0077&amp;quot; ];
              };
            };
            mdadm-root = {
              size = &amp;quot;100%&amp;quot;;
              content = {
                type = &amp;quot;mdraid&amp;quot;;
                name = &amp;quot;root&amp;quot;;
              };
            };
          };
        };
      };
      disk1 = {
        type = &amp;quot;disk&amp;quot;;
        device = &amp;quot;&amp;#x2F;dev&amp;#x2F;disk&amp;#x2F;by-id&amp;#x2F;ata-HGST_HUS726T4TALA6L1_V6GEAMHS&amp;quot;;
        content = {
          type = &amp;quot;gpt&amp;quot;;
          partitions = {
            esp = {
              size = &amp;quot;512M&amp;quot;;
              type = &amp;quot;EF00&amp;quot;;
              content = {
                type = &amp;quot;filesystem&amp;quot;;
                format = &amp;quot;vfat&amp;quot;;
                mountpoint = &amp;quot;&amp;#x2F;boot&amp;#x2F;ESP1&amp;quot;;
                mountOptions = [ &amp;quot;umask=0077&amp;quot; ];
              };
            };
            mdadm-root = {
              size = &amp;quot;100%&amp;quot;;
              content = {
                type = &amp;quot;mdraid&amp;quot;;
                name = &amp;quot;root&amp;quot;;
              };
            };
          };
        };
      };
    };
    mdadm = {
      root = {
        type = &amp;quot;mdadm&amp;quot;;
        level = 0;
        content = {
          type = &amp;quot;filesystem&amp;quot;;
          format = &amp;quot;ext4&amp;quot;;
          mountpoint = &amp;quot;&amp;#x2F;&amp;quot;;
        };
      };
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;the-boot-config-that-actually-works&quot;&gt;The boot config that actually works&lt;&#x2F;h2&gt;
&lt;p&gt;Disko creates the RAID arrays, but it does &lt;em&gt;not&lt;&#x2F;em&gt; tell NixOS to assemble them at boot. That’s on you. If you forget this line:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;boot.swraid.enable = true;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;…congratulations, you’ve just installed a system that will never boot. You’ll find this out after waiting for the server to come back up, panicking, entering rescue mode, and staring at an initrd that has no idea what mdadm is.&lt;&#x2F;p&gt;
&lt;p&gt;The full boot section:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;boot.loader.systemd-boot.enable = false;
boot.loader.grub = {
  enable = true;
  efiSupport = true;
  mirroredBoots = [
    { devices = [ &amp;quot;nodev&amp;quot; ]; path = &amp;quot;&amp;#x2F;boot&amp;#x2F;ESP0&amp;quot;; }
    { devices = [ &amp;quot;nodev&amp;quot; ]; path = &amp;quot;&amp;#x2F;boot&amp;#x2F;ESP1&amp;quot;; }
  ];
};
boot.loader.efi.canTouchEfiVariables = true;
boot.swraid.enable = true;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;mirroredBoots&lt;&#x2F;code&gt; is the key piece — GRUB installs itself to both ESPs, so either disk can boot the system. (In this case the data on the RAID-0 is purely ephemeral, but for those running RAID-1 with data that matters, this mirrored boot setup is the important part — your system stays bootable even if a disk dies.) The &lt;code&gt;&quot;nodev&quot;&lt;&#x2F;code&gt; device means “don’t try to install to an MBR, this is pure EFI”. And &lt;code&gt;swraid.enable&lt;&#x2F;code&gt; — don’t forget it. I’ll say it again. Don’t forget it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;running-nixos-anywhere&quot;&gt;Running nixos-anywhere&lt;&#x2F;h2&gt;
&lt;h3 id=&quot;disable-ovh-monitoring-first&quot;&gt;Disable OVH monitoring first&lt;&#x2F;h3&gt;
&lt;p&gt;Before you do anything else, go to the OVH panel and disable their monitoring. The kexec step replaces the running kernel in-place, which looks to OVH’s monitoring like your server just died. This triggers an automated “intervention” that can temporarily lock your netboot settings in the panel. Ask me how I know.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;the-install&quot;&gt;The install&lt;&#x2F;h3&gt;
&lt;p&gt;Boot the server into OVH rescue mode from the panel. Once you can SSH in as root, from your local machine:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run github:nix-community&amp;#x2F;nixos-anywhere -- \
  --flake .#my-server \
  --phases kexec,disko,install \
  root@x.x.x.x
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The &lt;code&gt;--phases kexec,disko,install&lt;&#x2F;code&gt; is not optional. This is the single most important flag and the hill I will die on.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;why-no-reboot-phase&quot;&gt;Why no reboot phase&lt;&#x2F;h3&gt;
&lt;p&gt;Kimsufi rescue mode is netboot-based. Every time the server reboots, it checks the OVH panel for which boot device to use. If it’s still set to “rescue”, you’re back in Debian. If nixos-anywhere includes the &lt;code&gt;reboot&lt;&#x2F;code&gt; phase (which is the default), it will happily reboot the server right back into rescue mode, not into your freshly installed NixOS.&lt;&#x2F;p&gt;
&lt;p&gt;The full workflow is:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Disable OVH monitoring in the panel (if you haven’t already — seriously, do this)&lt;&#x2F;li&gt;
&lt;li&gt;Boot the server into OVH rescue mode&lt;&#x2F;li&gt;
&lt;li&gt;nixos-anywhere installs with &lt;code&gt;--phases kexec,disko,install&lt;&#x2F;code&gt; (no reboot)&lt;&#x2F;li&gt;
&lt;li&gt;Go to the OVH panel, change netboot from “rescue” to “Boot from the hard disk”&lt;&#x2F;li&gt;
&lt;li&gt;Reboot the server from the OVH panel&lt;&#x2F;li&gt;
&lt;li&gt;NixOS boots&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h3 id=&quot;secrets-management-dance&quot;&gt;Secrets management dance&lt;&#x2F;h3&gt;
&lt;p&gt;nixos-anywhere generates a new SSH host key on every run. If you’re using sops-nix (and you should be), the age key derived from that host key will change. After installation completes:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;# Get the new age key
ssh-keyscan x.x.x.x 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null | ssh-to-age

# Update .sops.yaml with the new key, then:
sops updatekeys secrets&amp;#x2F;my-server.yaml
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then on first real boot, push the updated secrets:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix shell nixpkgs#nixos-rebuild -c nixos-rebuild switch \
  --flake .#my-server --target-host root@x.x.x.x
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;verify-before-you-deploy&quot;&gt;Verify before you deploy&lt;&#x2F;h3&gt;
&lt;p&gt;Each failed boot on Kimsufi costs you real time — you have to enter rescue mode, wait for it, SSH in, re-run nixos-anywhere, wait for that, change the boot device again, reboot again. Before you commit to a deploy, check the critical bits:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix eval .#nixosConfigurations.my-server.config.boot.swraid.enable
# should print: true

nix eval .#nixosConfigurations.my-server.config.boot.loader.grub.efiSupport
# should print: true
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;A 30-second &lt;code&gt;nix eval&lt;&#x2F;code&gt; can save you 30 minutes of rescue mode cycling.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-java-web-start-kvm-a-portal-to-2008&quot;&gt;The Java Web Start KVM: a portal to 2008&lt;&#x2F;h2&gt;
&lt;p&gt;Kimsufi servers don’t have standard IPMI&#x2F;BMC consoles. What they have is an “IP KVM” that gives you a &lt;code&gt;.jnlp&lt;&#x2F;code&gt; file — a Java Web Start launcher. In 2026. Oracle killed Java Web Start in Java 11 (2018). OVH apparently did not get the memo.&lt;&#x2F;p&gt;
&lt;p&gt;You can’t just double-click a &lt;code&gt;.jnlp&lt;&#x2F;code&gt; file on a modern system and have anything happen. You need a Java 8 runtime and a way to parse the JNLP XML, download the referenced JARs, and launch them with the right classpath. This is exactly the kind of problem Nix was born to solve.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s a standalone &lt;code&gt;jnlp-run.nix&lt;&#x2F;code&gt; you can use without pulling in anyone’s entire dotfiles flake:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;# jnlp-run.nix — run ancient JNLP files with nix-shell
# Usage: nix-shell jnlp-run.nix --run &amp;quot;jnlp-run &amp;#x2F;path&amp;#x2F;to&amp;#x2F;kvm.jnlp&amp;quot;
let
  pkgs = import &amp;lt;nixpkgs&amp;gt; { };
  jdk = if pkgs.stdenv.isDarwin then pkgs.zulu8 else pkgs.jdk8;
in
pkgs.mkShell {
  packages = [
    (pkgs.writeShellScriptBin &amp;quot;jnlp-run&amp;quot; &amp;#x27;&amp;#x27;
      export PATH=&amp;quot;${pkgs.lib.makeBinPath [ jdk pkgs.xmlstarlet pkgs.curl pkgs.coreutils ]}&amp;quot;

      if [ -z &amp;quot;$1&amp;quot; ]; then
        echo &amp;quot;Usage: jnlp-run &amp;lt;file.jnlp&amp;gt;&amp;quot; &amp;gt;&amp;amp;2
        exit 1
      fi

      jnlp=&amp;quot;$1&amp;quot;

      codebase=$(xml sel -t -v &amp;#x27;&amp;#x2F;jnlp&amp;#x2F;@codebase&amp;#x27; &amp;quot;$jnlp&amp;quot;)
      jar_href=$(xml sel -t -v &amp;#x27;&amp;#x2F;jnlp&amp;#x2F;resources&amp;#x2F;jar&amp;#x2F;@href&amp;#x27; &amp;quot;$jnlp&amp;quot;)
      jar_url=&amp;quot;&amp;#x27;&amp;#x27;${codebase}&amp;#x2F;&amp;#x27;&amp;#x27;${jar_href}&amp;quot;

      os=$(uname -s)
      arch=$(uname -m)
      case &amp;quot;&amp;#x27;&amp;#x27;${os}-&amp;#x27;&amp;#x27;${arch}&amp;quot; in
        Linux-x86_64)  native_href=$(xml sel -t -v \
          &amp;#x27;&amp;#x2F;&amp;#x2F;resources[@os=&amp;quot;Linux&amp;quot; and (@arch=&amp;quot;x86_64&amp;quot; or @arch=&amp;quot;amd64&amp;quot;)]&amp;#x2F;nativelib&amp;#x2F;@href&amp;#x27; \
          &amp;quot;$jnlp&amp;quot; 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null) ;;
        Linux-i*86)    native_href=$(xml sel -t -v \
          &amp;#x27;&amp;#x2F;&amp;#x2F;resources[@os=&amp;quot;Linux&amp;quot; and (@arch=&amp;quot;x86&amp;quot; or @arch=&amp;quot;i386&amp;quot;)]&amp;#x2F;nativelib&amp;#x2F;@href&amp;#x27; \
          &amp;quot;$jnlp&amp;quot; 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null) ;;
        *)             native_href=&amp;quot;&amp;quot; ;;
      esac

      mapfile -t args &amp;lt; &amp;lt;(xml sel -t -v &amp;#x27;&amp;#x2F;jnlp&amp;#x2F;application-desc&amp;#x2F;argument&amp;#x27; -n &amp;quot;$jnlp&amp;quot;)

      tmpdir=$(mktemp -d)
      trap &amp;#x27;rm -rf &amp;quot;$tmpdir&amp;quot;&amp;#x27; EXIT

      echo &amp;quot;Downloading $jar_url ...&amp;quot;
      curl -ksSL -o &amp;quot;$tmpdir&amp;#x2F;app.jar&amp;quot; &amp;quot;$jar_url&amp;quot;

      if [ -n &amp;quot;$native_href&amp;quot; ]; then
        native_url=&amp;quot;&amp;#x27;&amp;#x27;${codebase}&amp;#x2F;&amp;#x27;&amp;#x27;${native_href}&amp;quot;
        echo &amp;quot;Downloading native lib $native_url ...&amp;quot;
        curl -ksSL -o &amp;quot;$tmpdir&amp;#x2F;native.jar&amp;quot; &amp;quot;$native_url&amp;quot;
        (cd &amp;quot;$tmpdir&amp;quot; &amp;amp;&amp;amp; jar xf native.jar)
      fi

      echo &amp;quot;Launching with $(java -version 2&amp;gt;&amp;amp;1 | head -1) ...&amp;quot;
      exec java -Djava.library.path=&amp;quot;$tmpdir&amp;quot; -jar &amp;quot;$tmpdir&amp;#x2F;app.jar&amp;quot; &amp;quot;&amp;#x27;&amp;#x27;${args[@]}&amp;quot;
    &amp;#x27;&amp;#x27;)
  ];
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Save that to a file, download the &lt;code&gt;.jnlp&lt;&#x2F;code&gt; from the OVH panel, and:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix-shell jnlp-run.nix --run &amp;quot;jnlp-run ~&amp;#x2F;Downloads&amp;#x2F;kvm_x.x.x.x.jnlp&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Nix pulls in Java 8 (Zulu on macOS, OpenJDK on Linux), xmlstarlet, and curl into an ephemeral shell. The script parses the JNLP XML, downloads the KVM applet JARs, grabs any platform-specific native libraries, and launches the whole thing. When you close the shell, it’s all gone. No system-wide Java 8 installation polluting your machine.&lt;&#x2F;p&gt;
&lt;p&gt;It’s a thoroughly modern solution to a thoroughly ancient problem. You get a VGA console view of your server that looks like it was rendered by a GPU from the Bush administration, but it works — and sometimes “it works” is all you need to figure out why your boot is hanging.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-hard-won-lessons-summarized&quot;&gt;The hard-won lessons, summarized&lt;&#x2F;h2&gt;
&lt;p&gt;For the skimmers and the future-me who will inevitably forget all of this:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;--phases kexec,disko,install&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — always. No reboot phase. Ever.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Disable OVH monitoring&lt;&#x2F;strong&gt; before kexec, or enjoy your locked panel.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;boot.swraid.enable = true&lt;&#x2F;code&gt;&lt;&#x2F;strong&gt; — disko won’t set this for you.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;UEFI, not BIOS&lt;&#x2F;strong&gt; — use &lt;code&gt;EF00&lt;&#x2F;code&gt; partitions, not &lt;code&gt;EF02&lt;&#x2F;code&gt;.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;Age keys change&lt;&#x2F;strong&gt; on every nixos-anywhere run — update sops immediately.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;nix eval&lt;&#x2F;code&gt; your boot config&lt;&#x2F;strong&gt; before deploying — rescue mode round-trips are not fun.&lt;&#x2F;li&gt;
&lt;li&gt;&lt;strong&gt;The KVM is Java Web Start&lt;&#x2F;strong&gt; — Nix can run it, your OS cannot.&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;wrapping-up&quot;&gt;Wrapping up&lt;&#x2F;h2&gt;
&lt;p&gt;From a blank Kimsufi server in rescue mode to a fully declarative NixOS system with RAID-0, mirrored boot, WireGuard tunnels, encrypted secrets, and remote deployment — all in an evening’s work whilst sipping a good Spanish Alhambra beer. There are worse ways to spend a Friday night.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>A Lightweight Prometheus Exporter for Bhyve VMs</title>
        <published>2026-03-13T16:00:00+00:00</published>
        <updated>2026-03-13T16:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/prometheus-bhyve-exporter/"/>
        <id>https://perlpimp.net/blog/prometheus-bhyve-exporter/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/prometheus-bhyve-exporter/">&lt;p&gt;I run a handful of bhyve VMs on FreeBSD, managed by &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;churchers&#x2F;vm-bhyve&quot;&gt;vm-bhyve&lt;&#x2F;a&gt;. Nothing exotic — just a straightforward setup without CBSD, RCTL, or RACCT. I wanted basic visibility into what each VM was doing: is it up, how much CPU is it burning, how much memory is it actually using. The kind of thing you glance at on a Grafana dashboard and move on.&lt;&#x2F;p&gt;
&lt;p&gt;The existing option for this on FreeBSD is &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;tycho-kirchner&#x2F;rctl_exporter&quot;&gt;rctl_exporter&lt;&#x2F;a&gt;, which instruments FreeBSD’s resource control framework. It’s a solid tool, but it’s designed for a broader use case — RACCT&#x2F;RCTL accounting across jails, processes, and login classes. If you’re already using RCTL limits and want to monitor those, it makes sense. But if you’re running a simple vm-bhyve setup without RACCT enabled, pulling in the full RCTL machinery just to see if your VMs are alive and how much memory they’re eating is overkill.&lt;&#x2F;p&gt;
&lt;p&gt;So I wrote &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;prometheus-bhyve-exporter&quot;&gt;prometheus-bhyve-exporter&lt;&#x2F;a&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-it-does&quot;&gt;What it does&lt;&#x2F;h2&gt;
&lt;p&gt;The exporter is a small Rust binary that parses the output of &lt;code&gt;vm list&lt;&#x2F;code&gt; (from vm-bhyve) and queries &lt;code&gt;ps&lt;&#x2F;code&gt; for each running VM’s bhyve process. On each Prometheus scrape it collects:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;bhyve_vm_up&lt;&#x2F;code&gt; — whether the VM is running (1) or stopped (0)&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;bhyve_vm_cpu_allocated&lt;&#x2F;code&gt; — number of CPUs allocated&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;bhyve_vm_memory_allocated_bytes&lt;&#x2F;code&gt; — memory allocated in bytes&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;bhyve_vm_cpu_usage_percent&lt;&#x2F;code&gt; — current CPU usage from ps&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;bhyve_vm_memory_rss_bytes&lt;&#x2F;code&gt; — resident set size&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;bhyve_vm_memory_vsz_bytes&lt;&#x2F;code&gt; — virtual memory size&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;bhyve_vm_pid&lt;&#x2F;code&gt; — PID of the bhyve process&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;All are gauges. All per-VM metrics carry a &lt;code&gt;vm&lt;&#x2F;code&gt; label with the VM name. There are also &lt;code&gt;bhyve_exporter_scrape_duration_seconds&lt;&#x2F;code&gt; and &lt;code&gt;bhyve_exporter_scrape_errors_total&lt;&#x2F;code&gt; for monitoring the exporter itself.&lt;&#x2F;p&gt;
&lt;p&gt;No kernel modules, no RACCT, no RCTL. Just &lt;code&gt;vm list&lt;&#x2F;code&gt; and &lt;code&gt;ps&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;installation-on-freebsd&quot;&gt;Installation on FreeBSD&lt;&#x2F;h2&gt;
&lt;p&gt;You need a Rust toolchain. The repo includes a Makefile and the scaffolding to build a proper FreeBSD package.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;from-source&quot;&gt;From source&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;git clone https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;ijohanne&amp;#x2F;prometheus-bhyve-exporter.git
cd prometheus-bhyve-exporter
sudo make install
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This drops the binary into &lt;code&gt;&#x2F;usr&#x2F;local&#x2F;bin&#x2F;&lt;&#x2F;code&gt; and an rc.d script into &lt;code&gt;&#x2F;usr&#x2F;local&#x2F;etc&#x2F;rc.d&#x2F;&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;as-a-freebsd-package&quot;&gt;As a FreeBSD package&lt;&#x2F;h3&gt;
&lt;p&gt;If you prefer a proper &lt;code&gt;.pkg&lt;&#x2F;code&gt; that shows up in &lt;code&gt;pkg info&lt;&#x2F;code&gt; and can be cleanly removed:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;git clone https:&amp;#x2F;&amp;#x2F;github.com&amp;#x2F;ijohanne&amp;#x2F;prometheus-bhyve-exporter.git
cd prometheus-bhyve-exporter
make package
sudo pkg add prometheus-bhyve-exporter-0.1.0.pkg
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h3 id=&quot;enable-and-start-the-service&quot;&gt;Enable and start the service&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;sudo sysrc bhyve_exporter_enable=YES
sudo service bhyve_exporter start
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The exporter listens on &lt;code&gt;0.0.0.0:9288&lt;&#x2F;code&gt; by default. You can override the listen address and the path to the &lt;code&gt;vm&lt;&#x2F;code&gt; command via &lt;code&gt;rc.conf&lt;&#x2F;code&gt;:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;sudo sysrc bhyve_exporter_listen_address=&amp;quot;127.0.0.1:9288&amp;quot;
sudo sysrc bhyve_exporter_vm_command=&amp;quot;&amp;#x2F;usr&amp;#x2F;local&amp;#x2F;sbin&amp;#x2F;vm&amp;quot;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;scraping-with-prometheus&quot;&gt;Scraping with Prometheus&lt;&#x2F;h2&gt;
&lt;p&gt;Point Prometheus at it the usual way:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;yaml&quot; class=&quot;language-yaml &quot;&gt;&lt;code class=&quot;language-yaml&quot; data-lang=&quot;yaml&quot;&gt;scrape_configs:
  - job_name: bhyve
    static_configs:
      - targets: [&amp;#x27;your-freebsd-host:9288&amp;#x27;]
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;If you’re managing Prometheus through NixOS (like I am), the equivalent in your scrape config looks like:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  job_name = &amp;quot;bhyve&amp;quot;;
  honor_labels = true;
  static_configs = [
    {
      targets = [ &amp;quot;your-freebsd-host:9288&amp;quot; ];
      labels = { instance = &amp;quot;your-freebsd-host&amp;quot;; };
    }
  ];
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;h2 id=&quot;grafana-dashboard&quot;&gt;Grafana dashboard&lt;&#x2F;h2&gt;
&lt;p&gt;The repo includes a &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;prometheus-bhyve-exporter&#x2F;blob&#x2F;master&#x2F;grafana&#x2F;dashboard.json&quot;&gt;Grafana dashboard&lt;&#x2F;a&gt; that you can import directly. It gives you an overview of VM status, CPU usage, and memory consumption at a glance.&lt;&#x2F;p&gt;
&lt;p&gt;Here’s what the CPU and memory panels look like in practice:&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;perlpimp.net&#x2F;blog&#x2F;prometheus-bhyve-exporter&#x2F;cpu-usage.png&quot; alt=&quot;CPU usage per VM&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;&lt;img src=&quot;https:&#x2F;&#x2F;perlpimp.net&#x2F;blog&#x2F;prometheus-bhyve-exporter&#x2F;memory-rss.png&quot; alt=&quot;Memory RSS per VM&quot; &#x2F;&gt;&lt;&#x2F;p&gt;
&lt;p&gt;Nothing fancy — just the metrics that matter when you want to know if a VM is misbehaving or if you need to rebalance resources.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;future-plans&quot;&gt;Future plans&lt;&#x2F;h2&gt;
&lt;p&gt;I’m considering submitting this as a PR to the &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.freebsd.org&#x2F;ports&#x2F;&quot;&gt;FreeBSD ports tree&lt;&#x2F;a&gt; once I’ve run it for a while longer and I’m confident it’s stable enough. The port scaffolding is already in the repo, so the jump from “builds locally” to “available in ports” shouldn’t be a large one.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;contributing&quot;&gt;Contributing&lt;&#x2F;h2&gt;
&lt;p&gt;The project is MIT licensed and lives at &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;prometheus-bhyve-exporter&quot;&gt;github.com&#x2F;ijohanne&#x2F;prometheus-bhyve-exporter&lt;&#x2F;a&gt;. If you run bhyve VMs and want to help make it better — forks and pull requests are very welcome. Whether it’s new metrics, bug fixes, or packaging improvements — all welcome.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Bootstrapping NixOS with a Template Generator</title>
        <published>2026-03-12T12:00:00+00:00</published>
        <updated>2026-03-12T12:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/bootstrapping-nixos-with-a-template-generator/"/>
        <id>https://perlpimp.net/blog/bootstrapping-nixos-with-a-template-generator/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/bootstrapping-nixos-with-a-template-generator/">&lt;p&gt;Every NixOS machine I manage starts the same way. A &lt;code&gt;configuration.nix&lt;&#x2F;code&gt; that imports the right modules. A &lt;code&gt;home.nix&lt;&#x2F;code&gt; that wires up the user’s shell, editor, and dev tooling. An entry in &lt;code&gt;flake.nix&lt;&#x2F;code&gt; that ties it all together. A deploy script that knows whether to rebuild locally or push to a remote host. A user registry entry with SSH keys and metadata.&lt;&#x2F;p&gt;
&lt;p&gt;None of this is hard. All of it is tedious, and tedious means error-prone. You copy an existing host directory, find-and-replace the hostname, forget to update the architecture, wonder why your aarch64 server is trying to pull x86_64 packages, and lose twenty minutes to a mistake that shouldn’t have been possible. I got tired of this, so I wrote a tool.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-setup-template&quot;&gt;The setup-template&lt;&#x2F;h2&gt;
&lt;p&gt;The &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;ijohanne&#x2F;dotfiles-ng&quot;&gt;dotfiles repo&lt;&#x2F;a&gt; now ships a Rust CLI called &lt;code&gt;setup-template&lt;&#x2F;code&gt;. It scaffolds new host and user configurations for the flake — interactively or from a config file. You run it, answer some questions, and it produces the files you would have written by hand, minus the transcription errors.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run .#setup-template -- new
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The wizard prompts for two things: who you are and what the machine is.&lt;&#x2F;p&gt;
&lt;p&gt;For the user, it asks for a username, full name, email, preferred shell, whether you want developer tooling, and any SSH public keys. For the host, it asks for a hostname, platform (Linux or Darwin), architecture, role (desktop or server), nixpkgs channel, deploy mode, and which optional modules to enable — secrets, neovim, and language toolchains.&lt;&#x2F;p&gt;
&lt;p&gt;When it finishes, it writes three files and prints a flake snippet to stdout:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;configs&#x2F;users.nix&lt;&#x2F;code&gt; — your user added to the shared registry&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;hosts&#x2F;&amp;lt;name&amp;gt;&#x2F;configuration.nix&lt;&#x2F;code&gt; — the system config for your new machine&lt;&#x2F;li&gt;
&lt;li&gt;&lt;code&gt;hosts&#x2F;&amp;lt;name&amp;gt;&#x2F;home.nix&lt;&#x2F;code&gt; — the home-manager config with your selected modules&lt;&#x2F;li&gt;
&lt;li&gt;A ready-to-paste &lt;code&gt;nixosConfigurations&lt;&#x2F;code&gt; or &lt;code&gt;darwinConfigurations&lt;&#x2F;code&gt; block&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;h2 id=&quot;what-this-looks-like-for-a-site-host&quot;&gt;What this looks like for a site host&lt;&#x2F;h2&gt;
&lt;p&gt;This site — perlpimp.net — is a Zola static site packaged as a Nix flake. The flake builds the HTML, compiles a CV to PDF via typst, and exports a NixOS module that sets up nginx with ACME, virtual hosts, domain redirects, and optional Plausible analytics. The module interface is small:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;services.perlpimpnet = {
  enable = true;
  domain = &amp;quot;perlpimp.net&amp;quot;;
  extraDomains = [ &amp;quot;www.perlpimp.net&amp;quot; ];
  analytics.plausible.enable = true;
};
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;To host this, you need a NixOS server. That server needs a system configuration, a user, deploy tooling, and probably secrets management for anything beyond the static site itself. This is exactly what the template generator produces.&lt;&#x2F;p&gt;
&lt;p&gt;You would run the wizard, select Linux, your architecture, the server role, and enable secrets. The generator writes the host configuration with systemd-boot, a user account, and a deploy script that either rebuilds from a local checkout or pulls from GitHub over SSH. You paste the flake snippet, add the perlpimpnet flake as an input, import its NixOS module into your host config, and you have a deployable server.&lt;&#x2F;p&gt;
&lt;p&gt;The manual version of this process involves copying files from another host, editing half a dozen values across three files, and hoping you got the deploy script arguments right. The template version is a two-minute wizard followed by adding your site-specific imports. One of these scales. The other is how I used to do it.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;non-interactive-mode&quot;&gt;Non-interactive mode&lt;&#x2F;h2&gt;
&lt;p&gt;The wizard is fine for one-off setups, but if you are provisioning multiple hosts — say a web server, a build server, and a desktop — you write a JSON config and run the generator in batch mode:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run .#setup-template -- generate --config setup.json
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The config file is a versioned schema with arrays of users and hosts. Each host references its primary user by username, selects its modules, and declares its deploy mode. The generator validates the whole thing — primary users must exist in the users list, language selections must be from the supported set, Darwin hosts can’t be servers — and either produces all the files or tells you what’s wrong. No partial output, no half-generated state.&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;json&quot; class=&quot;language-json &quot;&gt;&lt;code class=&quot;language-json&quot; data-lang=&quot;json&quot;&gt;{
  &amp;quot;version&amp;quot;: 1,
  &amp;quot;repo_ref&amp;quot;: &amp;quot;github:ijohanne&amp;#x2F;dotfiles-ng&amp;quot;,
  &amp;quot;users&amp;quot;: [{
    &amp;quot;username&amp;quot;: &amp;quot;ij&amp;quot;,
    &amp;quot;name&amp;quot;: &amp;quot;Ian Johannesen&amp;quot;,
    &amp;quot;email&amp;quot;: &amp;quot;ij@perlpimp.net&amp;quot;,
    &amp;quot;shell&amp;quot;: &amp;quot;fish&amp;quot;,
    &amp;quot;developer&amp;quot;: true,
    &amp;quot;ssh_keys&amp;quot;: [&amp;quot;ssh-ed25519 AAAA...&amp;quot;]
  }],
  &amp;quot;hosts&amp;quot;: [{
    &amp;quot;name&amp;quot;: &amp;quot;web01&amp;quot;,
    &amp;quot;platform&amp;quot;: &amp;quot;linux&amp;quot;,
    &amp;quot;arch&amp;quot;: &amp;quot;x86_64&amp;quot;,
    &amp;quot;role&amp;quot;: &amp;quot;server&amp;quot;,
    &amp;quot;nixpkgs&amp;quot;: &amp;quot;unstable&amp;quot;,
    &amp;quot;deploy_mode&amp;quot;: &amp;quot;remote&amp;quot;,
    &amp;quot;primary_user&amp;quot;: &amp;quot;ij&amp;quot;,
    &amp;quot;modules&amp;quot;: {
      &amp;quot;secrets&amp;quot;: true,
      &amp;quot;neovim&amp;quot;: true,
      &amp;quot;languages&amp;quot;: [&amp;quot;nix&amp;quot;]
    }
  }]
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This is the same config file you would check into version control if you wanted a record of how each machine was initialised. The generator is deterministic — same input, same output — so the config file doubles as documentation.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;a-concrete-example-from-zero-to-serving-perlpimp-net&quot;&gt;A concrete example: from zero to serving perlpimp.net&lt;&#x2F;h2&gt;
&lt;p&gt;Say you have a VPS from any provider that supports kexec — Hetzner, netcup, OVH, most of the usual suspects. The box is running whatever stock Linux the provider gave you. You want it running NixOS, serving this site, with secrets management and a deploy script, and you want it done in under an hour.&lt;&#x2F;p&gt;
&lt;p&gt;Here is the whole process.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;step-1-scaffold-the-host&quot;&gt;Step 1: scaffold the host&lt;&#x2F;h3&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;cd ~&amp;#x2F;git&amp;#x2F;dotfiles
nix run .#setup-template -- new
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The wizard runs:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;── User ──────────────────────────────────
Username: ij
Full name: Ian Johannesen
Email: ij@perlpimp.net
Shell [fish&amp;#x2F;zsh&amp;#x2F;bash]: fish
Developer mode? [Y&amp;#x2F;n]: y
SSH public key (empty to finish): ssh-ed25519 AAAA... ij@macbook
SSH public key (empty to finish):

── Host ──────────────────────────────────
Hostname: web01
Platform [linux&amp;#x2F;darwin]: linux
Architecture [x86_64&amp;#x2F;aarch64]: x86_64
Role [desktop&amp;#x2F;server]: server
Nixpkgs channel [unstable&amp;#x2F;stable]: unstable
Deploy mode [local&amp;#x2F;remote]: remote
Enable secrets? [Y&amp;#x2F;n]: y
Enable neovim? [Y&amp;#x2F;n]: y
Languages [nix, rust, lua, markdown, flutter]: nix
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;It writes &lt;code&gt;configs&#x2F;users.nix&lt;&#x2F;code&gt;, &lt;code&gt;hosts&#x2F;web01&#x2F;configuration.nix&lt;&#x2F;code&gt;, &lt;code&gt;hosts&#x2F;web01&#x2F;home.nix&lt;&#x2F;code&gt;, and prints the flake snippet.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;step-2-add-disko-and-the-site-module&quot;&gt;Step 2: add disko and the site module&lt;&#x2F;h3&gt;
&lt;p&gt;The generator gives you a working system config, but the VPS needs disk partitioning and your site needs its NixOS module. You create a &lt;code&gt;hosts&#x2F;web01&#x2F;disko.nix&lt;&#x2F;code&gt; for the VPS disk layout — a small BIOS boot partition and the rest as ext4:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{
  disko.devices = {
    disk.main = {
      type = &amp;quot;disk&amp;quot;;
      device = &amp;quot;&amp;#x2F;dev&amp;#x2F;vda&amp;quot;;
      content = {
        type = &amp;quot;gpt&amp;quot;;
        partitions = {
          boot = {
            size = &amp;quot;1M&amp;quot;;
            type = &amp;quot;EF02&amp;quot;;
          };
          root = {
            size = &amp;quot;100%&amp;quot;;
            content = {
              type = &amp;quot;filesystem&amp;quot;;
              format = &amp;quot;ext4&amp;quot;;
              mountpoint = &amp;quot;&amp;#x2F;&amp;quot;;
            };
          };
        };
      };
    };
  };
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Then you edit &lt;code&gt;hosts&#x2F;web01&#x2F;configuration.nix&lt;&#x2F;code&gt; to import disko and the perlpimpnet module:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;{ inputs, config, pkgs, lib, user, ... }:

let
  deploy = import ..&amp;#x2F;..&amp;#x2F;configs&amp;#x2F;deploy { inherit pkgs; };
in
{
  imports = [
    ..&amp;#x2F;..&amp;#x2F;configs&amp;#x2F;server.nix
    .&amp;#x2F;hardware-configuration.nix
    .&amp;#x2F;disko.nix
    inputs.disko.nixosModules.disko
    inputs.perlpimpnet.nixosModules.default
  ];

  networking.hostName = &amp;quot;web01&amp;quot;;

  services.perlpimpnet = {
    enable = true;
    domain = &amp;quot;perlpimp.net&amp;quot;;
    extraDomains = [ &amp;quot;www.perlpimp.net&amp;quot; ];
    analytics.plausible.enable = true;
  };

  environment.systemPackages = [
    (deploy.mkDeployScript {
      name = &amp;quot;deploy-web01&amp;quot;;
      host = &amp;quot;web01&amp;quot;;
    })
  ];

  sops = {
    defaultSopsFile = ..&amp;#x2F;..&amp;#x2F;secrets&amp;#x2F;web01.yaml;
    age = {
      sshKeyPaths = [ &amp;quot;&amp;#x2F;etc&amp;#x2F;ssh&amp;#x2F;ssh_host_ed25519_key&amp;quot; ];
      keyFile = &amp;quot;&amp;#x2F;var&amp;#x2F;lib&amp;#x2F;sops-nix&amp;#x2F;key.txt&amp;quot;;
      generateKey = true;
    };
  };

  system.stateVersion = &amp;quot;25.11&amp;quot;;
}
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;You add &lt;code&gt;perlpimpnet&lt;&#x2F;code&gt; and &lt;code&gt;disko&lt;&#x2F;code&gt; as flake inputs, paste the generated flake snippet, and commit.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;step-3-deploy-with-nixos-anywhere&quot;&gt;Step 3: deploy with nixos-anywhere&lt;&#x2F;h3&gt;
&lt;p&gt;The VPS is running stock Debian or Ubuntu. You don’t need to install NixOS manually — &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;nix-community&#x2F;nixos-anywhere&quot;&gt;nixos-anywhere&lt;&#x2F;a&gt; handles it. It SSH’s into the running Linux, kexec’s into a NixOS installer environment, partitions the disk using your disko config, and installs your flake configuration. One command:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;nix run github:nix-community&amp;#x2F;nixos-anywhere -- --flake .#web01 root@&amp;lt;vps-ip&amp;gt;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;This partitions &lt;code&gt;&#x2F;dev&#x2F;vda&lt;&#x2F;code&gt; according to &lt;code&gt;disko.nix&lt;&#x2F;code&gt;, installs the full NixOS configuration including nginx, ACME, and the perlpimpnet module, and reboots. When it comes back up, the site is live. The whole thing takes a few minutes, most of which is the build and transfer.&lt;&#x2F;p&gt;
&lt;h3 id=&quot;step-4-secrets-and-ongoing-deploys&quot;&gt;Step 4: secrets and ongoing deploys&lt;&#x2F;h3&gt;
&lt;p&gt;After the first boot, grab the host’s age key for sops-nix:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;ssh-to-age-remote root@&amp;lt;vps-ip&amp;gt;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Add it to &lt;code&gt;.sops.yaml&lt;&#x2F;code&gt;, create &lt;code&gt;secrets&#x2F;web01.yaml&lt;&#x2F;code&gt;, and you have encrypted secrets that only this host can decrypt.&lt;&#x2F;p&gt;
&lt;p&gt;From here on, deployments are a push and an SSH command:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;bash&quot; class=&quot;language-bash &quot;&gt;&lt;code class=&quot;language-bash&quot; data-lang=&quot;bash&quot;&gt;git push
ssh web01 deploy-web01
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;The deploy script on the host checks for a local checkout, falls back to GitHub if there isn’t one, and runs &lt;code&gt;nixos-rebuild switch --flake&lt;&#x2F;code&gt;. That is the entire workflow — template, customise, nixos-anywhere, deploy. No ISO to boot, no manual partitioning, no ansible playbooks, no forgetting which host has which configuration.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-happens-after-generation&quot;&gt;What happens after generation&lt;&#x2F;h2&gt;
&lt;p&gt;The generator gets you to a buildable configuration. It does not do everything. After the files land, you still need to:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;Merge the generated &lt;code&gt;users.nix&lt;&#x2F;code&gt; into any existing user registry&lt;&#x2F;li&gt;
&lt;li&gt;Paste the flake snippet into &lt;code&gt;flake.nix&lt;&#x2F;code&gt;&lt;&#x2F;li&gt;
&lt;li&gt;Add your site-specific module imports — in this case, the perlpimpnet NixOS module&lt;&#x2F;li&gt;
&lt;li&gt;If you enabled secrets, set up sops-nix: get the host’s age key, add it to &lt;code&gt;.sops.yaml&lt;&#x2F;code&gt;, create the secrets file&lt;&#x2F;li&gt;
&lt;li&gt;Build and deploy&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;p&gt;The tool prints these steps when it finishes. It does not try to be clever about automating things that require human judgment — like whether your secrets should use age or GPG, or which network interface your server uses. It handles the boilerplate. You handle the decisions.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-rust&quot;&gt;Why Rust&lt;&#x2F;h2&gt;
&lt;p&gt;The generator is built with clap for argument parsing, dialoguer for the interactive prompts, and serde for config serialisation. It has unit tests for rendering determinism and schema validation. It builds with &lt;code&gt;rustPlatform.buildRustPackage&lt;&#x2F;code&gt; in the flake and participates in &lt;code&gt;nix flake check&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;I could have written this as a bash script. I have written this as a bash script, more than once, for other projects. Bash is fine for gluing together commands, but the moment you need structured input validation, schema versioning, or deterministic output, you are fighting the language instead of using it. A Rust binary that either produces correct output or fails with a clear error is worth the marginal extra effort over a shell script that silently does the wrong thing when you typo an argument.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-broader-point&quot;&gt;The broader point&lt;&#x2F;h2&gt;
&lt;p&gt;NixOS already solves the “works on my machine” problem for system configuration. But the act of creating that configuration — the scaffolding step — is still a manual, copy-paste-and-edit workflow in most setups. The reproducibility of the system doesn’t help if the process of defining the system is ad hoc.&lt;&#x2F;p&gt;
&lt;p&gt;A template generator is not a novel idea. Every web framework has &lt;code&gt;create-app&lt;&#x2F;code&gt;. Every language has &lt;code&gt;init&lt;&#x2F;code&gt;. What’s less common is applying this to infrastructure definitions, where the cost of a wrong default is not a broken dev server but a misconfigured production host.&lt;&#x2F;p&gt;
&lt;p&gt;The goal is not to remove the need to understand what the configuration does. It is to remove the need to remember it. The generator encodes the conventions of the flake — where files go, how deploy scripts are wired, which modules exist — so that adding a new host is a matter of answering questions about the host, not about the flake’s internal structure. The knowledge lives in the tool, not in your head.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Flutter and Nix: When Your SDK Thinks It Owns the Filesystem</title>
        <published>2026-03-08T10:00:00+00:00</published>
        <updated>2026-03-08T10:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/flutter-and-nix/"/>
        <id>https://perlpimp.net/blog/flutter-and-nix/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/flutter-and-nix/">&lt;p&gt;So this started, as these things usually do, with a build that wouldn’t build. I was setting up a Flutter project inside a Nix flake — nothing exotic, just wanting reproducible dev environments like a civilised person — and iOS builds were failing in ways that made no sense until they made too much sense.&lt;&#x2F;p&gt;
&lt;p&gt;What followed was a few hours of digging through Xcode build phases, Gradle internals, Apple code signing documentation, and Flutter issue trackers. I came out the other side with a working — if impure — set of workarounds and a much clearer picture of why Flutter and Nix are fundamentally at odds. This is that write-up.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-symptoms&quot;&gt;The symptoms&lt;&#x2F;h2&gt;
&lt;p&gt;The immediate failures were straightforward enough. Builds would die because &lt;code&gt;codesign&lt;&#x2F;code&gt; couldn’t write to framework files that lived in the Nix store. &lt;code&gt;rsync&lt;&#x2F;code&gt; would fail copying frameworks because the source was read-only and the tooling expected to be able to modify what it copied. On the Android side, Gradle would try to create build output directories inside the Flutter SDK path itself — which, being in &lt;code&gt;&#x2F;nix&#x2F;store&#x2F;&lt;&#x2F;code&gt;, wasn’t going to happen.&lt;&#x2F;p&gt;
&lt;p&gt;The error messages all pointed the same direction: Flutter expected to write to places that Nix had made read-only. Not build output directories — the SDK’s own installation.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-found-the-ios-side&quot;&gt;What I found: the iOS side&lt;&#x2F;h2&gt;
&lt;p&gt;When you build a Flutter iOS app, Xcode invokes &lt;code&gt;xcode_backend.sh&lt;&#x2F;code&gt; during its Build Phases. This script copies &lt;code&gt;Flutter.framework&lt;&#x2F;code&gt; from the engine artifacts cache — under &lt;code&gt;bin&#x2F;cache&#x2F;artifacts&#x2F;engine&#x2F;ios&lt;&#x2F;code&gt; in the SDK tree — into the project and then into the built products directory. Older Flutter versions dumped it into &lt;code&gt;ios&#x2F;Flutter&#x2F;Flutter.framework&lt;&#x2F;code&gt;, newer ones target &lt;code&gt;BUILT_PRODUCTS_DIR&lt;&#x2F;code&gt; (see &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;flutter&#x2F;flutter&#x2F;issues&#x2F;70224&quot;&gt;flutter&#x2F;flutter#70224&lt;&#x2F;a&gt;).&lt;&#x2F;p&gt;
&lt;p&gt;These copies need to be writable because Apple’s &lt;code&gt;codesign&lt;&#x2F;code&gt; embeds signature data directly into the Mach-O binary. It physically modifies the file. There’s no detached signing option for embedded frameworks — the binary itself gets rewritten with the developer’s identity. That’s an Apple platform constraint, not something Flutter invented.&lt;&#x2F;p&gt;
&lt;p&gt;The thing is, the build output directory is already writable — that’s the whole point of a build directory. The correct approach is to read from the immutable SDK, copy into the mutable build output, and sign there. Flutter’s pipeline mostly does this now, but the assumption of SDK-level writability is still baked into enough places that it breaks on read-only filesystems.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-found-the-android-gradle-side&quot;&gt;What I found: the Android&#x2F;Gradle side&lt;&#x2F;h2&gt;
&lt;p&gt;The Flutter Gradle plugin tries to build under &lt;code&gt;$FLUTTER_SDK&#x2F;flutter_tools&#x2F;gradle&lt;&#x2F;code&gt;. That path is inside the SDK distribution. When &lt;code&gt;$FLUTTER_SDK&lt;&#x2F;code&gt; lives in the Nix store, Gradle fails with &lt;code&gt;Failed to create parent directory&lt;&#x2F;code&gt; errors pointing at &lt;code&gt;&#x2F;nix&#x2F;store&#x2F;...&lt;&#x2F;code&gt;.&lt;&#x2F;p&gt;
&lt;p&gt;This is documented in &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;NixOS&#x2F;nixpkgs&#x2F;issues&#x2F;260278&quot;&gt;NixOS&#x2F;nixpkgs#260278&lt;&#x2F;a&gt; and &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;NixOS&#x2F;nixpkgs&#x2F;issues&#x2F;289936&quot;&gt;#289936&lt;&#x2F;a&gt;. The workaround is to patch &lt;code&gt;build.gradle.kts&lt;&#x2F;code&gt; to redirect &lt;code&gt;buildDir&lt;&#x2F;code&gt; via an environment variable (&lt;code&gt;FLUTTER_GRADLE_PLUGIN_BUILDDIR&lt;&#x2F;code&gt;). It works, but it’s a patch against a design that shouldn’t need patching.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-found-the-sdk-itself&quot;&gt;What I found: the SDK itself&lt;&#x2F;h2&gt;
&lt;p&gt;Beyond the platform-specific issues, Flutter’s SDK is architecturally hostile to immutability. It caches downloaded engine artifacts under &lt;code&gt;bin&#x2F;cache&#x2F;&lt;&#x2F;code&gt;. It runs self-update checks. Its version detection expects a &lt;code&gt;.git&lt;&#x2F;code&gt; directory to be present and writable. It was designed as a self-managing tool — think &lt;code&gt;rustup&lt;&#x2F;code&gt; or &lt;code&gt;nvm&lt;&#x2F;code&gt; — that owns its entire directory tree.&lt;&#x2F;p&gt;
&lt;p&gt;There’s a key issue on the Flutter tracker that captures the whole problem: &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;github.com&#x2F;flutter&#x2F;flutter&#x2F;issues&#x2F;118162&quot;&gt;#118162&lt;&#x2F;a&gt;, titled “Immutable and Offline Flutter SDK mode”. The filing states it plainly — Flutter requires both write access to itself and internet access during initialisation. This makes it impossible to ship as a system package in RPM, DEB, or Flatpak format. The end user doesn’t have write access to &lt;code&gt;&#x2F;usr&#x2F;lib&#x2F;flutter&lt;&#x2F;code&gt;, and a Flatpak SDK extension doesn’t get internet during builds.&lt;&#x2F;p&gt;
&lt;p&gt;The issue was closed as a duplicate. As far as I can tell, the underlying problem hasn’t been addressed.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-the-nix-community-has-done-about-it&quot;&gt;What the Nix community has done about it&lt;&#x2F;h2&gt;
&lt;p&gt;There’s a whole ecosystem of workarounds at this point. The nixpkgs Flutter package creates a &lt;code&gt;flutter-wrapped&lt;&#x2F;code&gt; symlink forest around the read-only store path. Developers write patch scripts that intercept tools like &lt;code&gt;codesign&lt;&#x2F;code&gt; and &lt;code&gt;rsync&lt;&#x2F;code&gt; to make files writable before the real binaries run. Others patch Gradle build files, symlink &lt;code&gt;.git&lt;&#x2F;code&gt; directories from wrapped to unwrapped SDK paths, and various other creative horrors.&lt;&#x2F;p&gt;
&lt;p&gt;Manuel Plavsic documented a comprehensive set of steps in &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;manuelplavsic.ch&#x2F;articles&#x2F;flutter-environment-with-nix&#x2F;&quot;&gt;his guide&lt;&#x2F;a&gt;. The NixOS Discourse has threads going back years of people hitting the same wall. Every workaround exists for the same reason: Flutter doesn’t separate its distribution from its build state.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;what-i-ended-up-doing&quot;&gt;What I ended up doing&lt;&#x2F;h2&gt;
&lt;p&gt;I wrote an &lt;code&gt;apply-ios-fixes&lt;&#x2F;code&gt; script for the flake. It installs wrapper scripts for &lt;code&gt;codesign&lt;&#x2F;code&gt; and &lt;code&gt;rsync&lt;&#x2F;code&gt; that &lt;code&gt;chmod&lt;&#x2F;code&gt; files writable before the real tools touch them, and it patches &lt;code&gt;project.pbxproj&lt;&#x2F;code&gt; to prepend a scripts directory to &lt;code&gt;PATH&lt;&#x2F;code&gt; in the &lt;code&gt;xcode_backend.sh&lt;&#x2F;code&gt; build phases:&lt;&#x2F;p&gt;
&lt;pre data-lang=&quot;nix&quot; class=&quot;language-nix &quot;&gt;&lt;code class=&quot;language-nix&quot; data-lang=&quot;nix&quot;&gt;apply-ios-fixes =
  pkgs.writeShellScriptBin &amp;quot;apply-ios-fixes&amp;quot; &amp;#x27;&amp;#x27;
    set -euo pipefail

    root=&amp;quot;$(${pkgs.git}&amp;#x2F;bin&amp;#x2F;git rev-parse --show-toplevel 2&amp;gt;&amp;#x2F;dev&amp;#x2F;null || pwd)&amp;quot;
    nix_scripts=&amp;quot;$root&amp;#x2F;nix&amp;#x2F;ios-scripts&amp;quot;
    ios_scripts=&amp;quot;$root&amp;#x2F;ios&amp;#x2F;scripts&amp;quot;
    pbxproj=&amp;quot;$root&amp;#x2F;ios&amp;#x2F;Runner.xcodeproj&amp;#x2F;project.pbxproj&amp;quot;

    if [ ! -f &amp;quot;$pbxproj&amp;quot; ]; then
      echo &amp;quot;error: project.pbxproj not found&amp;quot; &amp;gt;&amp;amp;2
      exit 1
    fi

    # Install wrapper scripts
    mkdir -p &amp;quot;$ios_scripts&amp;quot;
    cp &amp;quot;$nix_scripts&amp;#x2F;codesign&amp;quot; &amp;quot;$ios_scripts&amp;#x2F;codesign&amp;quot;
    cp &amp;quot;$nix_scripts&amp;#x2F;rsync&amp;quot;    &amp;quot;$ios_scripts&amp;#x2F;rsync&amp;quot;
    chmod +x &amp;quot;$ios_scripts&amp;#x2F;codesign&amp;quot; &amp;quot;$ios_scripts&amp;#x2F;rsync&amp;quot;
    echo &amp;quot;installed ios&amp;#x2F;scripts&amp;#x2F;{codesign,rsync}&amp;quot;

    # Patch project.pbxproj – prepend PATH to
    # xcode_backend.sh build phases
    if grep -q &amp;#x27;PROJECT_DIR}&amp;#x2F;scripts&amp;#x27; &amp;quot;$pbxproj&amp;quot;; then
      echo &amp;quot;project.pbxproj already patched&amp;quot;
    else
      ${pkgs.gnused}&amp;#x2F;bin&amp;#x2F;sed -i \
        &amp;#x27;&amp;#x2F;xcode_backend\.sh\\&amp;quot; embed_and_thin&amp;#x2F;s|shellScript = &amp;quot;&amp;#x2F;bin&amp;#x2F;sh|shellScript = &amp;quot;export PATH=\\&amp;quot;&amp;#x27;&amp;#x27;${PROJECT_DIR}&amp;#x2F;scripts:&amp;#x27;&amp;#x27;${PATH}\\&amp;quot;\\n&amp;#x2F;bin&amp;#x2F;sh|&amp;#x27; \
        &amp;quot;$pbxproj&amp;quot;

      ${pkgs.gnused}&amp;#x2F;bin&amp;#x2F;sed -i \
        &amp;#x27;&amp;#x2F;xcode_backend\.sh\\&amp;quot; build&amp;quot;&amp;#x2F;s|shellScript = &amp;quot;&amp;#x2F;bin&amp;#x2F;sh|shellScript = &amp;quot;# Nix codesign fix: wrapper makes files writable before signing\\nexport PATH=\\&amp;quot;&amp;#x27;&amp;#x27;${PROJECT_DIR}&amp;#x2F;scripts:&amp;#x27;&amp;#x27;${PATH}\\&amp;quot;\\n&amp;#x2F;bin&amp;#x2F;sh|&amp;#x27; \
        &amp;quot;$pbxproj&amp;quot;

      echo &amp;quot;patched project.pbxproj&amp;quot;
    fi

    echo &amp;quot;done – iOS Nix fixes applied&amp;quot;
  &amp;#x27;&amp;#x27;;
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;It’s impure. It’s the least of all evils. It gets the job done.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-takeaway&quot;&gt;The takeaway&lt;&#x2F;h2&gt;
&lt;p&gt;The root cause here isn’t technical complexity — it’s a design philosophy. Flutter treats its SDK directory as a mutable workspace. The SDK is simultaneously the distribution, the build cache, the artifact store, and the update mechanism. Everything co-located, everything writable.&lt;&#x2F;p&gt;
&lt;p&gt;Compare this to how other toolchains handle it. Rust’s &lt;code&gt;cargo&lt;&#x2F;code&gt; never writes back to the toolchain directory. Go’s &lt;code&gt;GOROOT&lt;&#x2F;code&gt; is read-only by design — &lt;code&gt;GOPATH&lt;&#x2F;code&gt; and the module cache live elsewhere. Even Node.js keeps &lt;code&gt;node_modules&lt;&#x2F;code&gt; in the project tree, not inside the runtime installation. The pattern of separating your distribution from your build state is well-established. Flutter just doesn’t follow it.&lt;&#x2F;p&gt;
&lt;p&gt;I’d call this horrendous citizenship on the internet. The “SDK is the world” mentality — where one tool absorbs dependency management, version control, artifact caching, and the build itself into a single mutable blob — is convenient for getting started. &lt;code&gt;flutter run&lt;&#x2F;code&gt; works on your laptop, in your home directory, with internet access. But it means your SDK can’t participate in Nix, can’t be distributed as a system package, can’t run in air-gapped CI, and can’t be shared read-only across users on a build server.&lt;&#x2F;p&gt;
&lt;p&gt;More focus should be put into doing things right, not just what’s easy. Flutter, with all of Google’s resources behind it, has no excuse for not having figured this out by now.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Lead the LLM, Don&#x27;t Let It Lead You</title>
        <published>2026-03-07T18:00:00+00:00</published>
        <updated>2026-03-07T18:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/lead-the-llm/"/>
        <id>https://perlpimp.net/blog/lead-the-llm/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/lead-the-llm/">&lt;p&gt;There is a misconception taking hold among developers using LLMs: that the code the model produces is the artifact worth keeping. It is not. The code is disposable. The prompt is the source.&lt;&#x2F;p&gt;
&lt;p&gt;The correct workflow is not prompt, accept, build on top. It is:&lt;&#x2F;p&gt;
&lt;pre&gt;&lt;code&gt;prompt
  → evaluate
  → reset to HEAD
  → revise prompt
  → evaluate
  → repeat
&lt;&#x2F;code&gt;&lt;&#x2F;pre&gt;
&lt;p&gt;Every time you accept generated code and start layering on top of it, you are accumulating debt against a foundation you did not write, do not fully understand, and cannot efficiently modify. You have traded authorship for speed and lost both.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;why-the-code-has-no-value&quot;&gt;Why the code has no value&lt;&#x2F;h2&gt;
&lt;p&gt;An LLM will produce working code on the first try often enough to be dangerous. The problem is that “working” is the lowest bar. The generated code might be correct but:&lt;&#x2F;p&gt;
&lt;ul&gt;
&lt;li&gt;Structured in a way that fights the rest of your codebase&lt;&#x2F;li&gt;
&lt;li&gt;Full of unnecessary abstractions “just in case”&lt;&#x2F;li&gt;
&lt;li&gt;Using patterns the model was trained on rather than patterns that fit your problem&lt;&#x2F;li&gt;
&lt;li&gt;Subtly wrong in ways that only surface later&lt;&#x2F;li&gt;
&lt;&#x2F;ul&gt;
&lt;p&gt;None of this matters if you treat the code as a disposable proof of concept. All of it matters if you commit it and move on.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-prompt-is-the-artifact&quot;&gt;The prompt is the artifact&lt;&#x2F;h2&gt;
&lt;p&gt;When you reset to HEAD and revise your prompt instead of patching the output, you are doing something that looks wasteful but is actually efficient. You are:&lt;&#x2F;p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Keeping your specification clean.&lt;&#x2F;strong&gt; The prompt is a declarative description of what you want. The code is one possible implementation. When you fix the code directly, your specification and implementation diverge immediately.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Getting a fresh generation every time.&lt;&#x2F;strong&gt; LLMs do not carry the baggage of their previous attempts unless you let them. A revised prompt produces revised code — not a patch on top of a patch.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Staying in control.&lt;&#x2F;strong&gt; You are editing a document you wrote (the prompt) instead of editing a document the machine wrote (the code). One of these you understand completely. The other you are guessing at.&lt;&#x2F;p&gt;
&lt;&#x2F;li&gt;
&lt;&#x2F;ol&gt;
&lt;h2 id=&quot;when-to-stop-iterating&quot;&gt;When to stop iterating&lt;&#x2F;h2&gt;
&lt;p&gt;You stop when the generated code meets your standards on a clean read. When you can look at the output, understand every line, and judge it as something you would have written yourself given enough time. That is the commit point — not before.&lt;&#x2F;p&gt;
&lt;p&gt;This means the prompt has to get specific. But not too specific.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-prompt-balance&quot;&gt;The prompt balance&lt;&#x2F;h2&gt;
&lt;p&gt;Vague prompts produce vague code. That much is obvious. But the instinct to fix this by writing maximally detailed prompts creates its own problems.&lt;&#x2F;p&gt;
&lt;p&gt;An overly concise prompt leaves too much to the model’s discretion and you get output shaped by its training data rather than your intent. But an overly elaborate prompt introduces a subtler risk: you start encoding your own assumptions as constraints, and some of those assumptions are wrong. This is human tech debt — knowledge that is outdated, incomplete, or simply incorrect — leaking into the specification. The model would have made a better choice if you had not told it otherwise.&lt;&#x2F;p&gt;
&lt;p&gt;Worse, long and detailed prompts tend to accumulate conflicting reasoning. You specify one thing in paragraph two and contradict it in paragraph six. You may not notice, but the model will be pulled in both directions. LLMs are not deterministic to begin with — the same prompt will always produce some variance in output. But a prompt full of internal contradictions amplifies this dramatically. Instead of reasonable variation, you get wildly divergent outputs from the same input. The generation becomes unpredictable in ways that make evaluation harder and iteration slower.&lt;&#x2F;p&gt;
&lt;p&gt;The sweet spot is a prompt that is precise about what matters and silent about what does not. State the constraints that actually constrain. Describe the behaviour you need. Leave the implementation decisions you do not care about to the model — it may know more current patterns than you do. Each iteration should make the prompt more precise, not longer. If your prompt is growing but your output quality is not improving, you are adding noise, not signal.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;scaling-up-the-two-phase-approach&quot;&gt;Scaling up: the two-phase approach&lt;&#x2F;h2&gt;
&lt;p&gt;Everything above works well for small, self-contained projects — a CLI tool, a single module, a script. But once a project has real scope, ad hoc prompts stop scaling. You need to formalise them.&lt;&#x2F;p&gt;
&lt;p&gt;The approach is two phases. In the first phase, you are not writing code at all. You are iterating on documents: a &lt;code&gt;SPEC.md&lt;&#x2F;code&gt; that defines what the system does, and an &lt;code&gt;ARCHITECTURE.md&lt;&#x2F;code&gt; that defines how it is structured. You use the same reset-and-revise loop, but what you are evaluating is the documents themselves, not code. You iterate until these documents are precise, consistent, and complete enough to serve as the source of truth for the entire project.&lt;&#x2F;p&gt;
&lt;p&gt;In the second phase, when you generate code, the LLM works under these documents. Every prompt references them. Every evaluation checks the output against them. The documents are the constitution; the code is legislation that must comply with it.&lt;&#x2F;p&gt;
&lt;p&gt;These documents will evolve as new features are added — they are not frozen after the first phase. But they must always be held to the highest standard, because they are what keep the LLM from steering off course. Without them, each generation drifts further from your intent. With them, you have guardrails that compound in value over time.&lt;&#x2F;p&gt;
&lt;p&gt;This only works if you protect the documents. When you iterate on features or changes, the LLM will sometimes want to modify the spec or the architecture to accommodate its implementation. This is where you need to be most critical. Changes to these documents should either be made without an LLM — by you, deliberately — or subjected to a much higher degree of scrutiny than changes to code. A bad line of code the LLM can rewrite by refining a prompt. A bad line in your spec is a policy error that propagates into every future generation.&lt;&#x2F;p&gt;
&lt;p&gt;The spec drifts, the project drifts. Hold the model accountable to the documents, not the other way around.&lt;&#x2F;p&gt;
&lt;h2 id=&quot;the-implication&quot;&gt;The implication&lt;&#x2F;h2&gt;
&lt;p&gt;Whether it is a one-off prompt or a &lt;code&gt;SPEC.md&lt;&#x2F;code&gt; that governs an entire project, the artifact that matters is the one you wrote — not the one the model generated. The skill is not “using an LLM.” It is writing precise specifications under constraints. That is the same skill it has always been. The tool changed. The job did not.&lt;&#x2F;p&gt;
</content>
        
    </entry>
    <entry xml:lang="en">
        <title>Hello, World</title>
        <published>2026-03-07T10:00:00+00:00</published>
        <updated>2026-03-07T10:00:00+00:00</updated>
        
        <author>
          <name>
            
            Ian Johannesen
            
          </name>
        </author>
        
        <link rel="alternate" type="text/html" href="https://perlpimp.net/blog/hello-world/"/>
        <id>https://perlpimp.net/blog/hello-world/</id>
        
        <content type="html" xml:base="https://perlpimp.net/blog/hello-world/">&lt;p&gt;Welcome to perlpimp.net. This site is built with &lt;a rel=&quot;noopener noreferrer&quot; target=&quot;_blank&quot; href=&quot;https:&#x2F;&#x2F;www.getzola.org&#x2F;&quot;&gt;Zola&lt;&#x2F;a&gt;, packaged as a Nix flake, and served by nginx on NixOS.&lt;&#x2F;p&gt;
&lt;p&gt;More to come.&lt;&#x2F;p&gt;
</content>
        
    </entry>
</feed>
