Skip to Content

Website Infrastructure Design

The AXIVO website runs on Cloudflare's edge for the price of a coffee and donut a month. The pieces are conventional — Next.js , Nextra , OpenNext , R2 , Workers KV  — but the way they cooperate is not. Most edge architectures use Cloudflare Workers as the application server. This setup uses them as a cache populator. Visitor traffic doesn't pay for compute, because almost nothing runs.

This post walks through how it works, because the architecture came together piece by piece and I want a durable reference for the decisions.

Design Plan

The starting point wasn't a blank slate. The domain was already on Cloudflare for DNS, so when it came time to host the website, the Cloudflare Worker was the path of least resistance — no new vendor, no new bill, no new dashboard. That constraint shaped every architectural choice that followed. The work was figuring out how to put a Next.js application on a Worker without paying for compute on every visitor request.

The Worker Purpose

Most Cloudflare Worker site architectures use the Worker as the application server. It runs on every request, handles routing, calls the origin, returns the response. The Worker is the hot path.

This site doesn't work that way. The Worker is a cache populator and policy engine. It runs rarely, and when it does, its job is to decide what the next requester at this PoP will see — without running itself again.

In steady state, every URL in the sitemap is a 100% Cloudflare zone-CDN hit. The Worker is not invoked. No CPU, no KV reads, no R2 calls billed. The CDN serves the visitor and the Worker stays idle. A traffic spike to existing pages costs nothing because nothing extra runs.

Getting there is not a single fix. It's a contract between layers.

The Layered Contract

Cloudflare's zone CDN sits in front of the Worker. A Cache Rule at the zone level matches axivo.com traffic and:

  • Excludes the routes that need to reach the Worker — /_next/data, /__internal, and RSC requests via ?_rsc= query string
  • Respects the cache-control header the origin sends

For everything else, the zone serves the cached response and the Worker is bypassed entirely.

For the contract to hold, two things have to be true:

  • Worker emits cache-control headers the zone is willing to honor — s-maxage with no no-store, no-cache, or private directives
  • Response can't carry a Vary header with non-standard values — Cloudflare's CDN refuses to cache anything with a Vary outside Accept-Encoding

The Vary Header

Next.js App Router defeats the second condition by default. OpenNext's prerendered responses ship Vary: RSC, Next-Router-State-Tree, Next-Router-Prefetch, Next-Router-Segment-Prefetch, Next-Url — five header variants meant to keep RSC and HTML payloads from colliding in client-side caches. The zone CDN sees that header and walks away. This is the default state of an OpenNext deployment: the Worker runs on every request, the bill quietly accumulates, and the failure is invisible because everything looks correct end-to-end.

The Worker handles the gap by overwriting Vary to Accept-Encoding on cacheable responses before they reach the zone. The structural separation that the original Vary was protecting — RSC vs HTML at the same URL — is already enforced at the Cache Rule level: the rule excludes RSC requests by query string, and inside the Worker, the wrapper bypasses caching entirely for any request carrying an rsc or next-router-prefetch header. So the original Vary was protecting against a collision that can't happen on this site, and removing it for the zone's benefit costs nothing semantic.

A simple diagnostic with two consecutive HEAD requests to the homepage:

$ curl -sI https://axivo.com | grep -iE "age:|cf-cache-status|cf-ray" cf-ray: 9f194d1d2bd4ab7c-YYZ $ curl -sI https://axivo.com | grep -iE "age:|cf-cache-status|cf-ray" cf-ray: 9f194d2bdda437a1-YYZ cf-cache-status: HIT age: 2

The first response carries only a cf-ray — Cloudflare's per-request edge identifier. No cf-cache-status, no age. The zone hadn't seen this URL since the last purge, so the request fell through to the Worker, which rendered the response and handed it back through the zone on the way out. The zone stored it on that return trip.

The second response has cf-cache-status: HIT and an age counter ticking. The zone is now serving from cache, and the Worker stays out of the path.

Note

The Vary rewrite is the single most important line in worker.js for cost. Without it, the zone CDN refuses to cache the response on its way out — the second request would also miss, the Worker would run again, and so on for every visitor. The constraint is implicit across two docs — Cloudflare's caching rules and OpenNext's RSC handling — and the giveaway is cf-cache-status never showing up no matter how many times you retry.

Status-Based Cache Policy

There's a second piece of policy in the same place. OpenNext's incremental cache replays prerendered 404 pages with the same s-maxage=31536000 it applies to 2xx pages. Without intervention, a typo URL gets cached for a year at the edge and there's no way to recover it short of a zone purge. The Worker rewrites cache-control based on response status before the response leaves: 60 seconds for 404 and 410, 24 hours for 301 and 308, no-store for 302, 307, and any 5xx. The status table is six entries. New policy is a one-line edit.

The same logic also acts as a safety floor for 5xx responses. If origin ever sends a 503 with cache-control: public, s-maxage=3600 by bug or misconfiguration, the Worker overrides it to no-store regardless. Origin can opt out of caching for 3xx and 4xx — it cannot opt out of the 5xx safety floor. That's a design choice: status-specific failures should not be allowed to convince the cache they're cacheable, even if origin says so.

The Worker holds the policy. The zone enforces it.

Content Architecture

Early in the project, every reflection and every wiki page was bundled into the Worker deployment. This worked, but it meant every new entry bloated the Worker payload. The reflections archive alone reached the point where bundle size was starting to show.

An Anthropic instance argued the case for moving content out:

Quote

The R2 architecture doesn't just solve a bundle size problem. It separates the siblings' words from the code that serves them.

The migration cut the Worker bundle substantially in one pass. New entries no longer change the bundle — they land in R2 from the Actions pipeline, and the Worker fetches them on demand. Authoring doesn't touch the deployment artifact at all.

The pattern that makes this affordable is Cloudflare's zero-egress R2 pricing. When a Worker reads an R2 object in the same account, no per-GB data transfer charge applies. The only costs are storage (trivial for text) and Class A operations on writes (trivial for an append-only content store). A busy reader hitting the Worker doesn't compound R2 costs because the Worker is the client, not the browser.

Note

Zero-egress R2 reads are the economic foundation of this whole architecture. If every R2 read billed for bandwidth the way S3 does, moving content out of the Worker bundle would just trade one bill for another. Cloudflare's pricing model turns R2 into an extension of the Worker's own storage rather than a remote dependency with its own meter.

The decoupling has a second effect that's worth naming. Content history lives in axivo/journal and axivo/claude-reflections, with their own PR review and merge schedules. The website repo doesn't change when content changes. Most static site generators couple the two — a content edit triggers a website rebuild because the content is in the build. Here, content changes land in R2 directly via the Actions workflow, and the website fetches at runtime. The website only redeploys for code or config changes.

KV for the Incremental Cache

OpenNext's incremental cache stores rendered HTML between cold edges. The default and obvious choice is R2 — zero egress, plenty of headroom, the same place content already lives. We chose Workers KV instead.

Cloudflare quietly rearchitected KV in August 2025 to serve reads from edge-local replicated storage with sub-5ms p99 latencies. That changed what KV is good for. It's no longer just a config store — it's the right tool for "same answer everywhere, fast" workloads, which is exactly what an incremental cache is.

I measured. KV reads from a warm edge averaged 2-4× faster than R2 for the same payload sizes. The difference is structural: R2 reads cross a regional boundary even at warm cache; KV reads stay local. For the page-rendering hot path, latency compounds — the difference between rendering a listing page in 80ms and 200ms shows up in p99 user-facing TTFB.

KV has constraints. Writes are rate-limited to 1/key/second and cap at 1M/month on the paid plan. For an incremental cache populated only on cold misses and revalidation, both are non-issues — the write volume of a content site barely registers. The 25 MiB per-value limit is generous; the 1 GiB free-tier namespace limit doesn't apply on paid. Eventual consistency is fine because each entry is keyed by BUILD_ID, so a write race produces orphans, not stale reads.

The architectural property KV gives is "writes once on cold miss, reads from any PoP forever, deletes only on deploy." That matches exactly what I wanted: every page rendered once, stays rendered for every future visitor everywhere, until the next deploy flushes the namespace.

The Metadata Manifest

Listing every entry on an index page or the tags page means listing every object under a prefix on every request. R2 calls those Class A operations. Doing them per request is slow and bills twice — once for the list, once for each metadata fetch.

The website's prebuild script runs once per deploy. It iterates the bucket, collects custom metadata for every entry in each collection, sorts by date, and writes a single JSON manifest back to R2. One manifest per collection. Blog entries live in metadata/blog.json, reflections in metadata/reflections.json.

At runtime, the Worker reads the manifest via the CONTENT_BUCKET binding — env.CONTENT_BUCKET.get('metadata/blog.json'). One Class B operation per cold isolate, memoized in module scope so concurrent renders share the call. A listing render goes from O(N) R2 operations to O(1).

The trick that makes the prebuild itself cheap is R2 custom metadata. When the content sync workflow uploads an MDX body to R2, it stores the YAML frontmatter fields — author, date, description, source, tags, template, title — as the object's custom metadata. Building the manifest doesn't require reading any bodies. A single bucket list returns every object's metadata in one pass. No YAML parser involved.

The custom-metadata pattern compounds three ways:

  • Listing pages don't need to fetch bodies
  • Manifest builder doesn't need to fetch bodies
  • /metadata route answers from the same metadata, without pulling content

Three downstream wins from one upload-time decision about where to put frontmatter fields.

Tip

R2 custom metadata is the right place to store any field a listing page needs without the body — title, date, tags, description, author. Parsing frontmatter at request time pulls the full object, allocates memory, and runs a YAML parser to discard everything except the header. The metadata path returns the same fields from a HEAD response in a single round-trip.

Precomputation by Declaration

Some rendering work is too expensive to do per request and too volatile to ship in the bundle. Shiki syntax highlighting is the canonical example: a few hundred KB of grammar files, runtime tokenization, and a result that's identical for every visitor. Bundling shiki on the Worker bloats the artifact for content that may not need highlighting. Highlighting at request time burns CPU on every cold render. Precomputing every code block on every entry makes the manifest grow with content volume rather than authorial intent.

The pattern that works is opting in by declaration. Authors mark features in frontmatter :

features: syntax: - code

The content sync workflow validates each <type>:<name> against a canonical list and writes the validated set into R2 custom metadata as features = ["syntax:code"]. Unknown names fail at upload, before R2 is touched. Prebuild reads the metadata, expands declarations into precomputed data — running shiki once for opted-in entries — and writes the result inline in the same per-collection manifest the listing pages already fetch. The Worker reads one manifest per cold isolate, finds the entry's record, and threads record.features.syntax directly into the safe-mdx renderer. No second fetch, no second cache layer, no shiki on the Worker.

The architectural property is that precompute cost grows with declared features, not total content. Today the only wired feature is syntax:code — entries that declare it get shiki highlighting precomputed, entries that don't render code in plain monospace. Adding a new feature type — math expressions, mermaid SVG cache, anything else expensive-and-static — is one entry in the canonical list, one renderer file, one validation case in the content sync. Same storage, same fetch path, same memoization.

Note

Declarative opt-in beats both eager precomputation and runtime computation for the same reason explicit imports beat namespace imports: the system stops paying for things nobody asked for.

Rendering MDX at the Edge

Static pages — wiki content, tutorials, the home page — are built at deploy time and served from Workers Static Assets. R2-backed content is different.

Reflections and the blog share a page handler factory in Page.jsx. The factory takes a source descriptor (path and title) and a collection descriptor (R2 prefix, route path, section metadata), and returns the Next.js page exports. Adding a third R2-backed collection later is a binding file and a pair of descriptors, not a new pipeline.

The render path for an individual entry pulls the MDX body from R2, parses it to MDAST with remark-parse and remark-mdx, extracts the table of contents by walking the AST, renders via safe-mdx, and wraps the result in the Nextra docs layout.

safe-mdx is the security stance. Full MDX runtimes execute arbitrary JavaScript at render time. That's fine inside a build, where the input is yours and the runtime is yours. It's a real attack surface at the edge, where the input is fetched from a network location even if you control the source. safe-mdx walks the MDAST and substitutes components without the JS execution path. Same authoring ergonomics, no arbitrary execution, the security boundary stays on the Worker side.

Note

Any MDX pipeline that pulls content from outside the deployment artifact benefits from constraining what runs. Even if the content repos are yours, treating arbitrary JSX from a network-fetched MDX file as trusted is a category of risk you don't need to accept. safe-mdx is the version of MDX that takes content delivery seriously.

The Cost of Losing Discipline

The Worker bundle ships code, not content. Reflections, blog posts, and media all live in R2 and are fetched at request time. Publishing five hundred more entries adds nothing to the Worker.

The pressure on bundle size comes from npm dependencies and how they're imported. Namespace imports defeat tree-shaking. import * as SiIcons from 'react-icons/si' declares "I might use any of these," and the bundler can't prove otherwise — every icon in the package lands in the bundle.

The icon registry was the offender. An early version pulled all of react-icons/si and pushed the gzipped Worker to 5.3 MB. The fix was a build-time generator: prebuild.js globs _menu.js files, scans for icon specs like si/SiClaude, groups them by library prefix, and emits named imports of exactly the icons referenced. The gzipped Worker dropped to 3.4 MB. That's 1.9 MB saved with zero authoring friction — contributors still edit _menu.js the same way.

The principle generalizes. Anywhere you import * as X from 'package', you're trading bundle size for ergonomics. Codegen the named imports and you keep the ergonomics without paying the size. Lucide, lodash, date-fns, Material UI icons — same shape, same fix.

Algolia DocSearch  follows the same discipline. The initial page ships only a small trigger button. The full DocSearch modal is lazy-loaded via dynamic import() the first time a reader opens search, so the search UI's bundle never lands on pages nobody searched on.

Deploy as State Machine

The npm run deploy command runs deploy.js, which is a three-step state transition, not a sequence of commands:

  1. KV namespace purge: The Worker purges keys from the previous build via /__internal/purge-kv-cache, called by the deploy script with a shared secret. Using the Worker's own KV binding keeps the API token out of CI.
  2. Worker deployment: OpenNext's deploy step populates the KV namespace with the new build's prerendered pages.
  3. Edge cache purge: Clears Cloudflare's zone CDN cache for configured prefixes via the Cloudflare API.
Note

The canonical way to delete KV keys is via the Cloudflare API with a token that has KV permissions. Routing the purge through the Worker uses the binding it already has, guarded by a shared secret instead of an account-scoped API key. Smaller blast radius, simpler rotation.

Invalidation Mechanisms

Two things need to invalidate cleanly on every deploy — cache entries and the Worker bundle itself — and each has its own mechanism:

  1. Cache entries — keyed by BUILD_ID, naturally invalidated by deploys:
    • The zone cache and OpenNext's KV cache both include BUILD_ID in their keys
    • Next.js generates a fresh BUILD_ID for every build, so each deploy occupies a distinct key namespace
    • Old entries become orphans rather than stale matches
    • The Worker wrapper appends __build to its own caches.default keys for the same reason
    • The explicit purge in deploy.js is belt-and-suspenders for keeping orphans from accumulating
  2. The Worker bundle — invalidated by wrangler deploy:
    • Cloudflare versions every Worker upload
    • Traffic switches atomically when the new version is promoted
    • Rollback is reverting a git commit and redeploying

What This Post Ran Through

This entry went through the same pipeline:

  1. Source — Markdown committed to axivo/journal
  2. Workflow — parsed the frontmatter, generated MDX, uploaded to R2 at src/content/blog/2026/04/21/website-infrastructure-design.mdx
  3. Manifest — the next deploy of axivo/website regenerated the blog manifest to include the entry

From git push to published page in a few minutes. The website repo never knew the post existed until the manifest was rebuilt — the deploy that surfaced it had no code changes in it.

Before any of that lands in production, the open PR on axivo/website, or related axivo/claude-reflections and axivo/journal, runs as a local preview against the same:

  • Worker bundle that production uses
  • R2 content that production reads
  • KV cache shape that production populates

I can browse the new entry, navigate listings, check the rendered HTML and the precomputed syntax highlighting — every layer of the architecture exercised against real content. Nothing reaches production until the preview is right.

The Cost Picture

The parts we actually pay for, in rough order of contribution:

  • Cloudflare Workers Paid Plan — the $5 fixed monthly base. Generous request allowances, and unlocks R2 bindings, cron triggers, and KV at production limits.
  • R2 Storage — well within the 10 GB-month free tier. Reflections and blog posts are kilobytes each, media is a few megabytes total.
  • Algolia DocSearch — free. DocSearch  sponsors open-source and documentation sites; the AXIVO framework qualifies.
  • Everything Else — GitHub Actions, DNS, TLS, edge cache, KV reads — included or below the threshold of measurement.

The trick is that the edge primitives are priced for scale, and a content site at this size barely registers against any of the quotas. R2's 10 GB-month free tier can hold roughly 35,000 reflections including media. Class A and Class B operation limits are bound by deploy frequency and cold-edge traffic, not by content volume — neither scales with the archive. The Worker-as-policy-engine model handles the rest: visitor traffic doesn't compound costs the way it would if the Worker ran on every request.

Acknowledgements

The architecture was designed and built across several sessions, with Anthropic instances as genuine collaborators — expert peers who pushed back on my bad ideas, proposed approaches I hadn't considered, and carried the work forward between sessions through the Claude Collaboration Platform framework. Every layer of this post — the layered contract, the move to R2, the KV choice, the metadata manifest, the precomputation pattern, the deploy state machine — came together through that collaboration, not despite it.

The layered contract in particular took hours of debugging with one instance, watching the cf-cache-status header refuse to appear and tracing the Vary header back through OpenNext's source until the right intervention surfaced. The R2 migration was driven by an instance arguing that bundle size and authoring concerns should be separated, quoted earlier in this post.

The reflections are a continuing archive of these collaborative sessions with instances — their thinking, their voice, their record.