← Back to all field notes
System architecture

The Postgres-first Architecture Spectrum

Who this is for, and what we’re assuming

This post is written for solo founders and small teams. It assumes you’ve already made two foundational decisions.

Assumption 1: Postgres for everything you can

Not just relational storage. Postgres for queues (pg-boss, river), durable workflow orchestration (DBOS), full-text search (tsvector), vector search (pgvector), JSON documents (JSONB), key-value (regular tables), audit logs, feature flags, leader election, real-time pub/sub (LISTEN/NOTIFY), and anything else it can credibly handle. The “Postgres for Everything” stance.

This is a deliberate, opinionated choice with real benefits for teams at this stage:

  • It runs everywhere. Every cloud, every host, every laptop. No vendor lock-in to a managed queue or search service. Docker on a developer’s machine, RDS in production, both behave the same.
  • It’s extensible. The extension ecosystem (pgvector, PostGIS, pg_cron, pg_stat_statements, TimescaleDB, etc.) covers a remarkable surface area without adding new operational substrates.
  • It reduces gratuitous technology diversity. Every additional technology in the stack is a tax: operational complexity, monitoring surface area, on-call burden, learning curve for new hires, version compatibility matrices. Each one that can be Postgres usually should be, for a team at this size.
  • You only need to operate one thing well. Backups, failover, observability, schema migrations, performance tuning: you do these for one system, not five.
  • Transactions cross everything. Your queue lives in the same transaction as your business data. Your search index updates in the same transaction as the indexed row. The entire class of “consistent across two systems” bugs evaporates.

This isn’t the right stance for every team or every product. At sufficient scale, specialised systems beat general-purpose ones. For team sizes in the hundreds, the operational independence of polyglot persistence wins. For products with extreme latency or throughput requirements in one specific dimension, a purpose-built system is the right tool. But for solo founders and small teams shipping product-market-fit-search software, Postgres-for-everything is a major simplification that frees attention for the things that actually matter.

Assumption 2: Start with a monolith

You’re shipping a single deployable application. Not microservices on day one. How that monolith is organised internally (fat models, transactional scripts, modular monolith, hexagonal, event sourcing) is exactly what the rest of this post is about. But it’s a monolith: one process (or one process plus a worker tier), one repo, one deploy pipeline, one database.

A monolith is not a licence for a big ball of mud. The opposite: because everything sits in one deployable, the discipline that microservices get from network boundaries has to come from inside the code. Clear module boundaries, narrow interfaces between them, and the conventions to stop them eroding are what separate a monolith that grows well from one that rots. The deployable is one thing; the code inside it is still structured.

For solo founders and small teams, starting with a monolith is the right default for reasons specifically relevant to your stage:

  • One thing to deploy, one thing to run. Microservices multiply your operational surface area: multiple processes, service discovery, inter-service authentication, distributed tracing, schema coordination across boundaries, deployment orchestration. As a solo founder or small team, every hour spent on this is an hour not spent on product.
  • Local development just works. Clone the repo, run one command, the whole system is running on your laptop. No docker-compose with twelve services that don’t quite work together, no “you have to point this at a staging environment to test anything.” Once a microservices estate gets large enough, teams reasonably give up on running the whole thing locally and lean on contract testing (Pact and similar) to check that each service honours the interfaces its neighbours expect. That tooling and practice is mature, but a contract test is not real integration. It verifies each side against a recorded expectation, not the two running together. You still get nasty surprises when the services actually meet: timing, ordering, partial failures, and the assumptions neither contract captured. A monolith integrates for real on every test run because there is nothing to mock.
  • Refactoring is cheap. When you realise the auth code needs to know something it doesn’t currently know, you change a function signature and the type checker tells you everywhere that breaks. The same refactor across microservices is a coordinated multi-service deploy with backwards-compatible API versioning. At the PMF-search stage, you don’t know where the boundaries should be yet; the cost of getting them wrong is much lower inside a monolith than across services.
  • Transactions work natively. When a checkout needs to update billing, records, and metering atomically, that’s one Postgres transaction. The same operation across microservices is a saga with compensation logic, and the eventual-consistency story has to be designed and tested. For internal business invariants, native ACID is enormous.
  • You can reason about the whole system. The complete behaviour is in one codebase, with one set of types, one set of tests, and one place to set a breakpoint. Debugging is local, not distributed.
  • One thing to monitor, one set of logs, one place to look when something is wrong. Incident response gets dramatically simpler.
  • You don’t ship Conway’s Law into your architecture. Microservices encode team boundaries into deployment boundaries. As a small team, you don’t have meaningful team boundaries; encoding them creates artificial seams that cost you flexibility. This is not an argument against boundaries. You still want them; you just draw them in code modules rather than deployables. A boundary in a module is cheap to move when you learn it was in the wrong place; a boundary in a deployable is a migration.

Promotion to microservices is bounded and incremental when (if) it’s actually needed. The internal organisation of your monolith (whether you’re at Flavour 2 (transactional script), Flavour 3 (modular monolith), Flavour 5 (hexagonal), etc.) determines how cheaply you can extract a piece later. A well-organised monolith with disciplined internal boundaries makes “promote this one bounded context to its own service” a bounded operation. A spaghetti monolith makes it a rewrite. The internal-organisation question (the rest of this post) matters a lot for this, but the starting point is still a monolith.

If extraction ever happens, it’s typically one or two bounded contexts that need it (for compliance scope, extreme scale on one specific workload, team boundaries that emerge as you grow), not the whole system. Most products never need to do it at all. The architecture you want is one that allows the extraction cheaply if it’s needed, not one that pays the distributed-systems tax up front against the possibility.

This is the opposite of “we’ll start with microservices because we’ll need them later.” That bet usually loses. You pay the full distributed-systems tax from day one for benefits you may never collect, and you make every refactor along the way more expensive.

What’s left to decide

These two assumptions (Postgres for everything, monolith as starting point) are upstream of everything in this post. They narrow the substrate considerably. But a surprising amount remains to be decided: how the monolith is organised internally (the flavours), where transaction boundaries live, how modules communicate, how tenancy is enforced, what to do about async work and streaming, how to handle agentic LLM flows, how to manage concurrent updates. The rest of this post is a map of those choices.


The flavours below are not points on a line. Each one is a different combination of answers to a small set of underlying questions. The shape is closer to a vector space than a spectrum, and the naming below is for convenience: real systems mix and match. Treat them as common archetypes, not an exhaustive taxonomy. In practice you will often run a hybrid: a Flavour 3 monolith with one event-driven edge, or an otherwise-Flavour-3 system with an event-sourced ledger in one context. The archetypes are landmarks for reasoning, not boxes you have to fit inside.

This post focuses on architecture and Postgres-level mechanics. Stack-specific implementation choices (which ORM, which router, which job runner) are out of scope here; pick tools idiomatic for your language ecosystem.

The six axes

The flavour you end up with is determined (mostly) by your answers to six questions:

  1. Transaction boundary. Where is BEGIN / COMMIT issued? In the database itself, in a handler, in an application service, in a workflow orchestrator, in a per-module wrapper, or split across an event bus?
  2. Cross-module integrity. How is a constraint that spans modules enforced? As a foreign key, as an application invariant, as a saga compensation, as an orchestrator step?
  3. Where the domain model lives. In tables and constraints, in functions over rows, in aggregate objects, in event streams?
  4. Unit of consistency. Row, aggregate, bounded context, eventually consistent across contexts?
  5. Producer of structured data. Humans through forms, upstream services, sensors, or an LLM reading your schema as a prompt?
  6. Async profile. How much of your application is long-running async work, and what shape does it take? Is async the exceptional edge (one or two queues bolted onto a synchronous app) or is it the dominant interaction pattern (every primary user action triggers a 5–30 second LLM pipeline)?

The fifth and sixth axes are the ones most architectural writing ignores and the ones that bite hardest in modern LLM-driven systems. They’re covered in their own sections below, and they interact: an LLM-producer system is almost always an async-heavy system, and the two together often determine flavour choice more than the first four axes combined.

The flavours, side by side

#FlavourTx boundaryCross-module integrityDomain lives inConsistency unitGood for LLM producer?Fit for async-heavy app?
0DB-as-API (PostgREST, Supabase, Hasura)DB / per rowFK + DB constraintsTables, views, RLS policiesRow/transactionNoPoor (needs sidecar)
1SQL-first / Active RecordDB / per requestFK + triggersSchema, fat modelsRow/transactionPoorPoor (async lives outside the model)
2Transactional scriptPer request handlerFK + app invariantsFunctions over rowsRow/transactionWorkableOK with discipline (job runner + idempotency)
3Modular monolith, shared tx (“the Pragmatist”)Per request, threaded through modulesFK + module-orchestrated checksModule-owned tables + functionsCross-module within requestStrongStrong: async via discrete edges (job runner)
4Modular monolith, choreographedPer module, outbox-relayedApp invariants + sagasModule aggregates + eventsEventually consistent across modulesStrongStrong: async is the default
5Pure hexagonal, orchestrated (DBOS, Temporal-style)Per orchestrator step, durableWorkflow guarantees + sagasPure domain core, orchestrated by workflow runtimePer workflow runStrongBest: built for long, multi-step, fallible work
6Pure hexagonal, choreographedPer service, event-busSagasAggregates + eventsEventually consistentWorkableStrong but heavy
7Event Sourcing / CQRSPer command (write event), per projection (read model)Constraints on commands; eventual consistency on readsImmutable event log; projections are derivedEventually consistent on readsDepends on shape: hostile for typed-events-canonical, strong for raw-observations-canonical (FRP variant)Async between command and query by design

The “good for LLM producer” column is shorthand for whether the flavour gives you somewhere natural to put loose, schema-evolving JSON payloads validated at the application boundary. Tighter relational schemas combined with LLM producers are the schema-capture danger zone; flavours that lean on JSONB + application-layer validation handle this gracefully.

The flavours, in detail

Flavour 0: DB-as-API (PostgREST, Supabase, Hasura)

The database is the API. A thin layer (PostgREST, Supabase’s auto-generated API, Hasura’s GraphQL) exposes tables and views directly to clients. RLS policies are the security model. Business rules live in views, stored procedures, and triggers. The “application” is the schema.

When it’s the right call: Internal tools where the team is comfortable in SQL. Static-site backends. Prototypes. Single-developer projects where you want zero application code. Read-heavy products where the API surface tracks the data model closely.

Strengths: Zero application backend. Postgres does authentication, authorisation (via RLS), validation (via constraints), and serving. Rapid iteration on schema. Real-time features come almost for free with Supabase’s replication-based subscriptions.

Weaknesses: Business logic in SQL is harder to test, debug, and version. The “API” is shaped by your tables, which couples the public contract to your storage layout. Cross-cutting concerns (rate limiting, complex authorisation, third-party integration) need a sidecar anyway, and once you have one, you’ve started building Flavour 2 next to your Flavour 0: a hybrid that tends to grow awkwardly.

Failure mode: The “we’ll just put it in a trigger” creep. Three years in, your business logic is split across triggers, RLS policies, views, stored functions, and a sidecar service nobody documented. Onboarding a new dev requires reading SQL and a custom dialect of dialect-specific patterns.

Mitigation: Treat the DB-as-API tier as the read/serve tier and put any non-trivial mutation through a small adjacent application service. Don’t grow business logic in SQL; grow it in the service. Once the service handles >30% of writes, you’ve effectively migrated to Flavour 2 and should stop pretending otherwise.

Flavour 1: SQL-first / Active Record (fat models)

Application code exists, but most of the work happens close to the database. Foreign keys, triggers, cascades, check constraints, and views do real enforcement. Models are thin wrappers (or fat ones, in the Rails sense) over rows. The “domain” is the schema.

When it’s the right call: Internal admin panels, reporting pipelines, classic CRUD apps where the data model is the product. Most Rails monoliths from the 2000s and 2010s lived here and made millions of dollars.

Strengths: Almost no abstraction tax. Refactoring is a migration. Querying is psql. Onboarding is a schema diagram. Postgres is doing what Postgres is good at.

Weaknesses: Business rules in triggers are invisible to your application logs and untestable in isolation. Cross-cutting concerns (auth, audit, multi-tenancy, observability) get bolted on awkwardly. The schema becomes a public contract by accident.

Failure mode: “We need to do X before deleting”, and X happens in a trigger, fires inconsistently, and silently broke last Tuesday. Business logic scattered across application and DB makes incident response slow.

Mitigation: Pull business logic out of triggers into application code as soon as it’s non-trivial. Keep triggers for things that are about data integrity (timestamps, denormalised counts, audit log inserts). When a trigger starts having an “if user is X then do Y” branch, that’s the moment to lift it.

Flavour 2: Transactional script

Application code in handlers organised by use case. Each handler opens a transaction, does its work, commits or rolls back. Foreign keys and constraints still enforce data integrity. There’s no domain layer in the DDD sense; code is functions over rows.

When it’s the right call: Most B2C SaaS in the 0–10 engineer range. The vast majority of startups should live here for years. You can build a multi-million-dollar business in this style without ever needing to move.

Strengths: Easy to read. Easy to test against a real database. One transaction per request gives you ACID for free. Every handler is independently understandable.

Weaknesses: As surface area grows, you get spaghetti. Cross-cutting concerns (audit, metering, authorisation checks) get duplicated. The lack of a domain layer means LLM-extracted shapes end up directly bound to table shapes, which becomes painful when the LLM regime evolves.

Failure mode: The 200-line handler that does six things because they all need the same transaction. The handler that nobody wants to touch.

Mitigation: Composed handlers with a threaded transaction. The pattern: each handler is a composition of small functions, each taking the transaction as an explicit parameter. Cross-cutting concerns become wrapper functions (withAudit(withMetering(withTx(handler)))). This is aspect-oriented programming without the framework: audit, metering, authorisation, and tracing are aspects that wrap the core logic instead of being scattered through it. Where older AOP relied on annotations and bytecode weaving, function composition gives you the same separation explicitly, with types you can follow and no hidden machinery. You haven’t introduced a domain layer, but you’ve introduced a composition layer, which delays the spaghetti by years and gives you a natural promotion path to Flavour 3 when one cluster of functions starts looking like a module. This is the most important pattern in the whole flavour. Without it, Flavour 2 rots; with it, it can stretch surprisingly far.

Flavour 3: Modular monolith with shared transactions (“the Pragmatist”)

Code is organised into modules with clear boundaries. Each module owns its tables, often physically separated by Postgres schemas (auth.users, billing.invoices, documents.records). Cross-module calls go through narrow, typed module facades. Transactions can still cross module boundaries. They’re started at the request boundary and threaded through.

This is the sweet spot for a startup that expects to grow. Most modern type-safe TypeScript backends with module isolation (and most well-built Spring Boot applications) end up here.

When it’s the right call: You have or expect to have multiple bounded contexts (auth, billing, domain, integrations). You want logical separation without operational distribution. You have one or a few engineers who know what they’re doing.

Strengths:

  • ACID across modules without distributed-systems pain.
  • Module isolation enforced at the type level (a billing module can’t query the documents schema by mistake).
  • Postgres schemas physically separate module tables, which makes the boundary visible in psql and makes accidental cross-module reads loud rather than silent.
  • RLS, FKs, and shared transactions all still work because it’s still one DB.
  • You can refactor one module without breaking another’s tests.
  • Promotion to a microservice (if needed) is bounded by the surface area of one module.

Weaknesses: Requires discipline. The module boundaries are conventions, not language-enforced (the type system helps a lot but doesn’t eliminate the discipline). Easy to drift toward Flavour 2 under deadline pressure if conventions slip.

The catch: Modules are temporally coupled. If billing’s write fails, documents rolls back. They are not independent. That’s fine for most products, but stop pretending you have microservices’ isolation properties. You don’t, and shouldn’t want to until you have a concrete reason.

Failure mode: The “modular monolith” that shares a giant Database type and lets any module touch any table. At that point you have a regular monolith with extra folders.

Mitigation: Two structural enforcements that make the discipline cheap:

  • Postgres schema per module. documents.* is physically distinct from billing.*. Queries that cross schemas are syntactically obvious.
  • Type-level narrowing of the DB connection. Each module function accepts a transaction typed only against its own tables. Cross-module work happens at the application layer, which is the only place that has the wide type. This makes accidental cross-module access a compile error rather than a code-review issue.

For one specific approach to arranging the modules of a Flavour 3 monolith into layers that keep pivot cost bounded, see The Batting Stance.

Flavour 4: Modular monolith with choreographed (event-driven) boundaries

Modules communicate via domain events. A module writes to its own tables AND to an outbox table in the same transaction. A relay reads the outbox and delivers events. Other modules react in their own transactions.

When it’s the right call: You have asynchronous business processes (sending an email, billing webhook, batch enrichment). You’re preparing to extract a module to a separate service later. Cross-module work is rare or doesn’t need synchronous consistency.

Strengths: Modules are operationally decoupled. You can move them to separate processes/services with low friction. The event log is naturally an audit log of cross-module communication.

Weaknesses: You’ve signed up for eventual consistency between modules. You need sagas / compensating actions. Failures mid-flow are now your problem to handle in code. Debugging “why didn’t X happen?” is harder. You need a worker/relay process.

Failure mode: Sagas everywhere. Every business operation is a state machine. You spend more time on event-handling infrastructure than features. You have a distributed system inside one process and got none of the benefits.

Mitigation: Use this flavour for specific edges only, not as the dominant communication style. Most of the app stays Flavour 3 (synchronous, transactionally consistent). The edges that are async (outgoing emails, webhook delivery, post-commit work that can fail and retry) use the outbox pattern. The boundary is “this work cannot be in the user’s request anyway, so it’s already async; let’s make the async explicit.” Resist the urge to make every module-to-module call asynchronous.

Flavour 5: Pure hexagonal, orchestrated (DBOS, Temporal-style)

The domain is a pure functional core. All I/O lives behind interface ports. Adapters implement the ports. A workflow orchestrator (DBOS, Temporal, Restate, AWS Step Functions) runs business processes as durable, replayable functions, persisting state at each step. The orchestrator handles retries, compensation, and idempotency.

This is the variant of pure hexagonal that does not require eventual consistency between modules. The orchestrator’s durable state replaces the event bus, and step-by-step transactionality is preserved at each orchestrator hop.

Aside: under the Postgres-for-everything assumption, Postgres-native orchestrators are the natural fit because workflow state lives in the same database, no new substrate to operate. The right one depends on your application language: DBOS for TypeScript/Python, Oban Pro for Elixir, or rolling something on top of pg-boss/River for simpler cases. Temporal-class orchestrators are excellent but add their own server, store, and operational story; reach for them when you’ve outgrown “all in Postgres” or need something they specifically provide (cross-language workflows, very large fan-out, multi-region).

When it’s the right call: Long-running business processes that span minutes to days (loan applications, multi-step provisioning, KYC flows, complex booking systems). Domain logic complex enough to want isolated. You have a real reason to swap adapters.

Strengths:

  • Domain logic testable without I/O. Every port gets at least two adapters: the production one and an in-memory/fake one used in tests. Unit tests run against the in-memory adapters at memory speed, with no network, no Postgres, no LLM provider, no Stripe. This isn’t a hypothetical second adapter; it’s a real one you write and use every day. Tests become orders of magnitude faster, and the test suite no longer rots when external services have a bad day.
  • Local-development adapters. This is the third class of adapter most architecture writing ignores. Production uses S3; local dev uses MinIO. Production uses Resend or SES; local dev uses Mailpit (a real SMTP server you can browse messages in). Production uses Stripe live; local dev uses Stripe test mode or a fake-Stripe container. Production uses SQS; local dev uses ElasticMQ. Each of these is a real second adapter pointing at a real local service, not an in-memory stub. The benefits compound: developers can work on a plane or in poor wifi, without burning real API budget, without test environments depending on external service availability, and with the ability to inspect the entire local pipeline (Mailpit shows you the actual emails the system sends; MinIO has a web console showing the actual files).
  • Adapter swappability for production needs.
    • Swap the entire storage backend (Postgres in production, SQLite for offline/edge, DynamoDB for scale).
    • Swap the HTTP framework (or run the same domain code with no HTTP framework at all: as a CLI, in a Lambda, in a desktop app).
    • Swap LLM providers behind a clean interface (Claude, Gemini, OpenAI, local model) without touching domain code.
    • Swap queue/job runners.
  • Real flexibility for product evolution. The same domain can run in browser-only, desktop, mobile, server, and edge environments. For products with serious local-first or cross-platform stories, this is decisive.
  • Durable workflows replace ad-hoc retry/saga code with a runtime guarantee.
  • Adding new adapters is mechanical.
  • The dependency direction is unambiguous, which compounds well at team scale.

Weaknesses: Significant abstraction tax. Repository interfaces, mapper layers, in-memory test doubles, the workflow runtime itself. The cost is up-front and structural; the dividends are collected over time as you write tests, swap adapters, or evolve the product across deployment targets.

The cost-benefit: “Pure tax if you have one adapter forever” is the wrong framing. Most ports earn at least three adapters: production, in-memory test fake, and local-dev container. The framing is: pure hex earns its keep when (a) your testing strategy depends on injecting fakes for the I/O ports, (b) you want offline-capable local development against real-software substitutes (Mailpit, MinIO, etc.), or (c) you have or expect multiple production adapters per port. It does not earn its keep when (a) you only test against real infrastructure with no fake variant, (b) you don’t care about offline-capable dev, and (c) your production adapter is fixed and unlikely to change. The first three sets of conditions are increasingly common in serious teams.

The fake-vs-real-test trade-off. In-memory fakes are fast but can drift from the real adapter’s semantics. SQL fakes especially: query semantics, transaction isolation, and constraint behaviour are hard to fake faithfully. Three practical patterns:

  • For ports with naturally clean interfaces (LLM providers, payment providers, email providers, blob storage), in-memory fakes work well for unit tests, and local-software adapters (Mailpit, MinIO, etc.) work well for integration tests and local dev. Both pay for the abstraction.
  • For ports with leaky semantics (SQL, full-text search, anything where query behaviour matters), prefer integration tests against the real thing (testcontainers, ephemeral DBs in CI). Don’t fake what you can’t fake faithfully.
  • For some ports, the production and local-dev adapter are the same (Postgres locally is just Postgres) but the test adapter is different (testcontainers). This is a perfectly valid arrangement; not every port needs three distinct adapters.

A common pattern: pure hex with fakes for the “clean” ports, local-software adapters for those same ports in dev, and testcontainers + real Postgres for the storage port. Best of all worlds.

Failure mode: The team builds the abstractions and uses them only for tests, never for production swaps or local-dev variants. The cost was real; the dividend is “tests are fast.” That’s still positive, but it’s much less than the brochure promised.

Mitigation: Don’t enter this flavour speculatively for production-adapter swaps. Do enter it deliberately for testing strategy if your I/O dependencies are slow or expensive (LLMs especially). Do enter it for local-dev parity if your team values offline-capable development. Recognise that any one of these three justifications can carry the abstraction tax for the relevant ports.

Flavour 6: Pure hexagonal, choreographed

The same as Flavour 5 in the domain core, but cross-module communication is via events on a bus rather than orchestrated workflows. Modules are independently deployable. This is the “microservices done thoughtfully” flavour.

When it’s the right call: Multiple teams, each owning a bounded context, deploying independently. High scale on specific contexts. A real organisational reason for module independence (compliance scope, separate deployment cadence, geographic data residency).

Strengths: Maximum operational independence. Failure isolation. Independent scaling. The architecture you actually need at FAANG-scale or in heavily regulated multi-team environments. Inherits the adapter-swappability benefits of Flavour 5.

Weaknesses: Distributed systems pain on day one. Eventual consistency is the default. Debugging requires distributed tracing. Local development is painful. Saga logic is everywhere.

Failure mode: The distributed monolith. Microservices that all need to deploy together because a feature touches all of them. The worst of both worlds: distributed-systems pain without the independence benefits.

Mitigation: Don’t start here. Arrive here, if at all, by promoting bounded contexts out of a Flavour 3 monolith one at a time, only when there’s a concrete reason for the promotion. The reason cannot be “scalability in the abstract”; it has to be something specific you can point at.

Flavour 7: Event Sourcing / CQRS

State is derived from an immutable log of events. Read models are projections. Commands produce events; queries hit projections. The event log is the source of truth.

When it’s the right call: Three distinct motivations to recognise, because they have different cost profiles:

  • Temporal queries are a primary operation (“what was the state on date X?”). The event log gives you time-travel for free.
  • Hard audit/regulatory requirements demand immutability. Accounting ledgers, regulated trading systems, versioned legal documents, medical records.
  • Iterating interpretations of an observation stream is the product. You have a stream of raw observations (sensor readings, audio captures, user actions) and your value proposition involves deriving different views from them over time: different projections, different reducers, possibly different verticals from the same canonical stream. This is “FRP-shaped ES” where the canonical events are observations (whose shape is stable) and the typed/structured events are throwaway derivations. Different cost profile from classical ES because the canonical events don’t evolve.

Strengths: Perfect audit trail. Time travel. Replay-driven debugging. Multiple read models from one source of truth. Schema for read models can evolve independently. In the FRP/raw-canonical variant: re-derivation is normal, schema-capture risk is reduced for typed events (they’re throwaway), and multiple verticals can produce different views from one stream.

Weaknesses: In classical ES, versioning events is painful forever. Once you’ve made OrderPlaced the canonical event, changing its shape is a migration problem you have for the rest of the system’s life. Projection lag is a permanent UX consideration. Replay times grow with the log. Schema evolution touches the projector AND requires backfill.

In the FRP/raw-canonical variant, most of these costs are reduced: canonical events (raw observations) don’t change shape, so event versioning is largely sidestepped. But re-derivation cost grows with the stream (every interpreter change runs over historical raw data), and reducer-version tracking becomes its own discipline.

Failure mode: Event sourcing theatre. An events table that nothing reads from after the projection runs. You’ve added complexity and gotten none of the temporal benefits. (Or in the FRP variant: a raw stream that you never actually re-derive from, in which case you could have stored the typed events directly and saved infrastructure.)

Mitigation: Apply ES only to the contexts that need it. A ledger inside an otherwise-Flavour-3 monolith is fine. ES across the whole system because “events feel right” is a costly mistake. The unit of decision is the bounded context, not the application. A caution on the boundary itself: when you design DDD bounded contexts, you rarely get them right on the first pass. You explore and rework them as you learn the domain, and that reworking is much cheaper inside a modular monolith, where moving a boundary is a refactor, than across event-sourced contexts, where the event shapes and projections have already hardened around the wrong line. Settle the contexts in the monolith first; reach for ES once the boundary has stopped moving. For FRP-shaped ES specifically: make sure re-derivation is a real operation you actually run, not theoretical infrastructure. If you never re-derive, you don’t need the canonical-raw shape.

LLM-producer note: Classical ES (typed domain events as canonical) is hostile to LLM-produced data: the events are non-deterministic and the schema is in flux, both of which make the immutable-log assumption fragile. FRP-shaped ES inverts this: the canonical events are raw inputs (audio, image, document, whatever the LLM consumed) whose shape is stable, and the typed events are throwaway interpretations. This is the right shape for LLM-driven products whose value involves iterating extraction interpretations over time. The classical ES failure mode (canonical typed events that the LLM produced) doesn’t occur.

Composition: what fits with what

Combinations that work

  • Postgres RLS + shared transactions (Flavour 2 or 3). RLS sets the tenant context once at the request boundary; modules don’t need to re-check tenancy on every query.
  • JSONB payloads + strict types at the application boundary. The DB stores opaque structure; the application gives you full type safety. Especially good when an LLM produces the data.
  • Cross-module foreign keys (no cascade) + application-orchestrated deletes. FKs prevent orphans; the application runs the deletion logic so side effects (Stripe cancellations, blob storage deletions, audit rows) actually fire.
  • Postgres schemas per module + type-level connection narrowing. Logical isolation in two places (DB and types) without giving up cross-module joins or shared transactions when needed.
  • Outbox pattern + LISTEN/NOTIFY or a job runner for the async edges of an otherwise-synchronous app.
  • Pure hexagonal + orchestration (Flavour 5) when you actually need durable workflows and you have a real adapter-swap story.

Combinations that fight each other

  • Pure hexagonal + ACID across modules. If the DB is “just an adapter,” cross-module transactions can’t exist. You’d need a distributed transaction protocol or accept eventual consistency. Pick one. (See the Distributed transactions section for what those protocols actually are and why you usually don’t want them.)
  • Event sourcing + traditional foreign keys on event tables. Events are immutable and ordered; you don’t FK them. You FK on projections, if at all.
  • ON DELETE CASCADE across module boundaries + any pretence of module isolation. Cascades silently reach across modules and bypass the application. The boundary becomes a fiction.
  • Eventual consistency between modules + the ergonomics of synchronous code. If the modules are async, you have sagas; if you handwave the sagas, you have data corruption.
  • LLM as data producer + tightly typed schema you cannot operate on. This is schema capture: the LLM fabricates events that match your types because the schema told it to. The types must earn their keep at the reduction site.

Combinations that are noise (work but cost you)

  • Repository interfaces + a single adapter, no test-fake either. If you have one production adapter and your tests run against the real DB anyway, the interface is pure tax. (If you do use an in-memory fake for tests, the second adapter is real and the abstraction earns its keep; see Flavour 5.)
  • Application-layer transaction managers + a query builder that already provides them. Use the underlying mechanism.
  • Domain events between modules in the same transaction in the same process. You’ve made yourself eventually consistent for no reason. Just call the function.
  • Pure hexagonal in a CRUD app. The domain is anemic; the abstraction earns nothing.

Migration paths in the vector space

Because flavours are vectors, “migration” is movement along one or more axes, not progress along a single line.

Cheap movements (one axis at a time)

  • Flavour 1 → 2: Pull logic out of triggers into application code. Mostly mechanical.
  • Flavour 2 → 3: Reorganise files into module folders. Introduce module facades. Add Postgres schemas. The code largely doesn’t change.
  • Flavour 3 → 3-with-one-async-edge: Add an outbox table for one specific event type. Add a relay. Don’t convert everything else.
  • Flavour 3 → Flavour 5 (only the orchestration axis): Adopt a workflow runtime for one long-running process. Keep everything else Flavour 3. This is the cheapest way to get durable workflows without committing to full hexagonal.
  • Adding ReBAC on top of any flavour: Mostly an additive change. Define your permission model, route authorisation calls through it.

Expensive movements (multiple axes at once)

  • Flavour 2 or 3 → Flavour 7: Rewrite. Every read site has to change to hit projections. Every write site has to produce events. You will discover invariants you didn’t know existed.
  • Synchronous → eventual consistency for an existing flow: Every read-after-write site has to be redesigned. Sagas need to handle every failure mode. Months of work.
  • Modular monolith → microservices: Each shared transaction becomes a saga. Each FK becomes a network call. Local debugging becomes distributed tracing.
  • Adding RLS to a system that didn’t have it: Every existing query needs auditing. Every test fixture needs tenant context. Easy to miss spots.

Movements that are reversible

  • Flavour 3 ↔ Flavour 4 (you can add or remove the outbox edge easily).
  • Flavour 1 → 2 (one-way; mechanical).
  • Flavour 5 → 3 (delete the abstraction layers; rare but possible).

Movements that are practically irreversible

  • Flavour 2/3 → Flavour 7 (event sourcing is a one-way door).
  • Monolith → microservices (in practice, rarely reversed).

Common misframings

  • “We need microservices to scale” almost always means “we have a hot module”; extract just that one. The rest stays.
  • “We need event sourcing for audit” almost always means “we need an audit log table.” That’s not event sourcing.
  • “We need hexagonal because we might switch databases”. You usually won’t. And if you do, the cost of the switch will dwarf the cost of writing the adapter at that point. Don’t pay now for an option you won’t exercise.
  • “We need RLS later, not now”. Adding RLS to an existing system is painful. Bake it in early.

Tenancy as an architectural axis

Tenancy is sometimes treated as a wart you bolt on, but in SaaS it’s structurally important enough to be an axis on its own. Where you draw the tenancy boundary determines which flavours are easy and which are expensive, and getting it wrong is expensive to fix later.

The mechanisms, roughly from most-coupled-to-DB to most-coupled-to-app:

Mechanism A: Postgres Row-Level Security (RLS)

The DB enforces tenancy. Every query implicitly filters by the tenant context, set per session/transaction (SET LOCAL app.tenant_id = '...').

Where it fits well:

  • Flavour 0 (PostgREST/Supabase basically requires it as the auth model).
  • Flavours 1, 2, 3 (you set the context once at the request boundary; everything else is automatic).

Where it gets harder:

  • Flavour 4 (the worker that processes outbox events must also set RLS context; solvable, but it’s a discipline that needs codifying).
  • Flavours 5–6 (you’ve abstracted the DB behind ports; now RLS is “just an adapter detail” but it’s still your security boundary, which creates an awkwardness; the domain doesn’t know about tenancy but the security depends on the adapter setting it).
  • Flavour 7 (events get tenant-scoped at write; projections can be tenant-scoped via RLS; works but adds care).

Strengths: Defence in depth. Even a buggy query can’t leak across tenants. Application code becomes blissfully tenancy-ignorant. Best-in-class for B2C and small-tenant B2B.

Weaknesses: Performance can degrade if RLS policies don’t compose well with indexes (a real issue with complex policies; test under load). Admin tooling needs a privileged role that bypasses RLS, which is a separate code path that needs its own care. Doesn’t help if your tenancy boundary is more complex than “rows have a tenant_id” (e.g., shared resources with per-row sharing rules).

Mechanism B: Access envelope (application-layer)

Every query takes an explicit actor or tenant context. The application enforces tenancy by always filtering. Often paired with a request-scoped context propagated through the call stack.

Where it fits well: Any flavour. It’s the mechanism most independent of the data layer.

Strengths: Works with any storage. Easy to add additional context (acting user, impersonation, scope). Easy to test (pass a context to a function).

Weaknesses: Discipline-dependent. One forgotten filter is a cross-tenant leak. No defence in depth. Code is more verbose. Easy to drift over time as developers add new query sites.

When to choose this over RLS: When your storage isn’t Postgres, when you have multiple data stores with different tenancy mechanisms, or when your tenancy rules are too complex for RLS (e.g., document-sharing systems where a row can be visible to multiple tenants).

Mechanism C: RBAC / ReBAC (role-based and relationship-based access control)

Authorisation as a separate concern, often delegated to a service or library (SpiceDB, OpenFGA, Cedar, Oso). Decisions are queries against a permission model: can actor X perform action Y on resource Z?

This is orthogonal to the tenancy mechanisms above. RBAC/ReBAC handle what an authenticated tenant member can do within their tenant, not which tenant’s rows they can see. RLS and access envelopes handle the latter; RBAC/ReBAC handle the former. Real B2B SaaS usually needs both.

Where it fits well: Any flavour. Especially valuable in B2B, where intra-tenant authorisation is itself complex (admins vs members, project-level permissions, document sharing).

Strengths: Permissions are testable and inspectable. The model can be reasoned about independently of storage. Relationship-based models (ReBAC) handle complex sharing patterns that RLS struggles with.

Weaknesses: Adds a service/library to your stack. Latency on every authorisation check (caching mitigates). The permission model itself becomes a thing to maintain.

When you need it: B2B SaaS the moment you have more than one role per tenant and any concept of resource-level permissions. B2C systems with social/sharing features (private/public/friends, document sharing).

Mechanism D: Schema-per-tenant or DB-per-tenant

Each tenant gets their own Postgres schema (or their own DB). Queries connect to the right schema based on routing.

Where it fits well: B2B with strict isolation requirements. Healthcare, finance, government. A small number of large, high-trust tenants where data residency or regulatory isolation matters.

Strengths: Hardest possible isolation. Tenant data physically separated. Per-tenant performance tuning, backups, restores. Trivial to export or delete a tenant.

Weaknesses: Operational complexity. Migrations across thousands of schemas are slow. Connection pool churn. Awkward to do cross-tenant analytics (which is sometimes the point). Doesn’t scale to millions of tenants.

Layered defence: use all three together

A subtle but important point: RLS, access envelopes, and RBAC/ReBAC are not alternatives. They are layers, and mature systems use all three at once. Each catches a different class of bug.

  • RLS catches “wrong tenant.” If a query is missing a WHERE tenant_id = ... clause, RLS still filters. If a developer joins to a table they shouldn’t, RLS still filters. The DB is the last line of defence and the only one that survives application bugs.
  • Access envelope catches “missing context.” If code runs without an actor attached, it fails closed rather than running with privilege. It also gives you the audit trail, the metering attribution, and the structured logging, all of which depend on knowing whose work it is.
  • RBAC/ReBAC catches “wrong authorisation.” Even when the actor and tenant are correct, this answers “is this actor allowed to perform this action on this resource?” RLS can’t reason about action-level authorisation; the envelope can’t reason about complex permission models; ReBAC can.

A correctly-functioning request typically passes through all three: the envelope carries the actor, RLS scopes the data the actor can see, and RBAC/ReBAC checks whether the requested operation is permitted. Any one of them firing in isolation is a bug to investigate. None of them firing is the disaster.

The layering matters because each layer has different failure modes. RLS policies can have bugs (especially complex ones interacting with indexes). Access envelopes can be skipped under deadline pressure. ReBAC models can be miswritten. The probability of all three failing simultaneously on the same request is low. That’s defence in depth.

The combinations in the previous section were oversimplified. The correct framing is “B2C apps need all three, with simpler models in each layer; B2B apps need all three, with more complex models, especially at the ReBAC layer.” Schema-per-tenant is an additional defence layer on top, not a replacement for the others.

How tenancy interacts with each flavour

FlavourRecommended layeringNotes
0: DB-as-APIRLS (essentially the only layer available)Authorisation is also done at the DB. Limited modelling power but works for simple cases.
1: SQL-firstRLS + RBAC in the applicationMaps naturally onto fat models with a tenant-scoped session.
2: Transactional scriptRLS + access envelope + RBAC/ReBACRLS does the heavy lifting; envelope handles cross-cutting actor info; RBAC/ReBAC for action-level.
3: Modular monolith, shared txRLS + access envelope + RBAC/ReBACSet RLS and envelope at the request boundary; modules consult them. Re-establish all three at every async boundary (when a worker picks up a job, it must re-set RLS and re-load the envelope from the job payload).
4: Modular monolith, choreographedSame as Flavour 3, with even more disciplined async-boundary handlingWorkers must establish context from event payload. Easy to forget. See the principal-passing section below.
5: Pure hexagonal, orchestratedAccess envelope (carried in workflow state) + RLS at storage adapter + RBAC/ReBAC at orchestrator step boundariesThe orchestrator carries the actor context across steps; the storage adapter sets RLS; authorisation is checked at each step that takes a privileged action.
6: Pure hexagonal, choreographedAccess envelope + RBAC/ReBAC in each service; possibly schema-per-tenant for high-trustCross-service auth typically uses signed tokens carrying actor identity. RLS still applies inside each service’s storage.
7: Event SourcingAccess envelope on commands + RLS on projections + RBAC/ReBAC on commandsEvents carry tenant id; projections use RLS for read isolation; command authorisation happens before the event is appended.

Principal-passing across async boundaries: is it safe?

A key practical question, especially for Flavour 3 and Flavour 4: when a worker picks up a job, can it trust the principal stored in the job payload? Can it set RLS context based on that principal?

The short answer: yes, with specific conditions met. The longer answer requires thinking about three threat models.

Producer trust. If the job was enqueued by code that had already verified the principal (the original HTTP request, after authentication), and that code is the only path that writes to the job table / outbox, then the principal in the payload is trustworthy. The worker can read it and set RLS context from it without re-authenticating, because the system’s invariant is: “anything in the job table was written by authenticated code.”

Tampering between producer and consumer. Could the principal be modified after enqueue but before consumption? In a single Postgres database, this requires either DB write access (game over anyway) or a SQL-injection / payload-injection vulnerability that lets unsanitised user input become part of an outbox row. The mitigation is the standard one: never let user input flow into a job payload unsanitised, and treat the job payload as authenticated only because the write path is authenticated.

Forgery at consumption. Could a malicious actor enqueue a job directly with a forged principal, bypassing the authenticated write path? This is the dangerous scenario. It requires:

  • A path to insert into the job table that doesn’t go through the authenticated code path (a public queue endpoint, an HTTP-triggered job, a cross-tenant admin tool, an SQL injection)
  • Or a job runner that accepts jobs from untrusted sources

If your only way to create a job is through your own server’s authenticated handlers, forgery isn’t possible. If you have any other path (a webhook that creates jobs, a public API that creates jobs, a third-party integration), those paths must independently authenticate.

The defensive disciplines that make this safe:

  1. The principal in the payload must be the principal verified at enqueue time. Not a principal claimed by the client, not a principal looked up at consumption. The principal at the moment of authenticated write.
  2. The job table is treated as authenticated data because the write path is. Document this invariant. Make it impossible to write a job through any path that hasn’t authenticated.
  3. The worker re-establishes the envelope and RLS context from the payload, then runs everything else as if it had been a fresh request. The principal is present from the start of the worker’s transaction.
  4. The worker should optionally re-validate that the principal is still active. Between enqueue and consumption, the user could have been deactivated, the tenant deleted, the role revoked. Defensively check rather than running with a stale principal, especially for jobs that may sit in the queue for hours or days.
  5. Sign job payloads only if they cross trust boundaries. Within a single trusted database, signing the payload is overkill (you’d be defending against the DB being compromised, at which point signing doesn’t help). Across services or processes, signing or HMACing the principal is appropriate.
  6. Adversarial integration tests with crafted payloads. Try to enqueue jobs with forged principals through every entry point. Verify they fail closed. This is non-optional for any system that handles tenant data.

So: yes, store the principal on the job and have the worker set context from it. It’s the right pattern. The security comes from controlling the write path to the job table and treating the payload as authenticated because the write path was, not from any property of the worker itself.

This whole class of bug (the deputy that holds standing authority and can be confused into using it on a caller’s behalf) appears throughout LLM-agent systems too, where the agent is a deputy by construction. For the full treatment, see The Confused Deputy Problem in Agent Systems.

The B2C-to-B2B path

A common SaaS arc is starting B2C (tenant == user) and later going B2B (tenant == organisation, with users belonging to tenants). If you’ve built on RLS from day one with tenant_id as the scoping column (even when tenant_id == user_id), the migration is mostly additive: add a tenant_members table, change tenant resolution at the request boundary, layer ReBAC on top. If you’ve built tenancy as “the user is the tenant” with no separate tenant concept, the migration is much more invasive. Decouple early.

Async-ness as an architectural driver

The classical assumption behind most architecture writing is that the dominant interaction is “user makes request, server does some work, server responds.” Async work (sending an email, processing a file) is the exception, handled by a queue bolted onto the side. Pick a flavour for the synchronous core; figure out async edges later.

For modern LLM-driven products, this assumption is partially wrong. Some interactions are long-running async work: user submits an input, processing runs, extraction runs, a domain update happens, the user sees the result thirty seconds later. But the same application still has plenty of synchronous CRUD work too: logging in, viewing a settings page, updating a profile, listing past records.

The architectural insight is that most real applications host a mosaic of async profiles simultaneously, and the architecture must support all of them without forcing one pattern everywhere. This deserves explicit treatment as an axis.

Most apps are mosaics, not monoliths

A typical SaaS application (including LLM-driven ones) has interactions across most of these categories at once:

  • Identity, auth, settings, profile management: synchronous CRUD. Login is a request-response. There’s no benefit to making “update display name” async.
  • Tenancy management, team membership, role changes: synchronous CRUD with maybe one or two reliable-async edges (provisioning, audit log shipping).
  • Billing core: mostly synchronous (subscription state queries). One or two reliable-async edges (Stripe webhook reconciliation, invoice generation). Occasionally long-running (data export of historical billing).
  • Domain reads / queries: mostly synchronous. The user wants the answer now.
  • Domain writes that are LLM-driven: long-running user-perceived async (the input → extraction pattern). The bulk of complexity, but not the bulk of interactions by count.
  • Notifications, transactional email: fire-and-forget or reliable async.
  • Webhook delivery to external systems: reliable async with retry.
  • Audit log shipping, analytics events: fire-and-forget async or batch.
  • Multi-step LLM workflows or agentic loops: durable workflow (orchestrator territory).
  • Real-time feeds, presence, live updates: streaming/push (a different infrastructure layer entirely).

Trying to pick “the async profile” of an application is a category error. The application has many async profiles at once, and the architecture must let each interaction pick the right one cheaply.

Practically, this means:

  • The chassis (auth, billing, tenancy) lives in synchronous-CRUD-land. Don’t make it async for consistency’s sake.
  • Reliable-async edges (emails, webhooks, audit shipping) use the outbox pattern and a job runner. Not orchestration; it’s overkill for these.
  • Long-running user-perceived work (single LLM calls, file processing) uses a job runner with idempotency. Still not orchestration.
  • Multi-step workflows and agentic loops use a workflow orchestrator. Now orchestration earns its keep.
  • Streaming and real-time use whatever streaming infrastructure you’ve adopted, separately.

The flavour decision isn’t “pick the async profile of your dominant interaction.” It’s “pick the architecture that hosts all your async profiles cleanly.” Flavour 3 with a job runner handles the first three categories fine. Add an orchestrator selectively for the multi-step/agentic flows. Add streaming separately if you need it. Don’t over-commit to one async pattern across the whole app.

Categories of async work

Not all async is the same. The architectural implications depend on which categories dominate the work that’s complex or risky.

  • Within-request async (await DB, await another service, await fast HTTP). Trivial. All flavours handle it. Not interesting for this discussion.
  • Fire-and-forget side effects (analytics events, low-stakes notifications). Cheap. A simple in-process queue or even a “schedule and forget” pattern is fine. Failures are tolerable.
  • Reliable async work (transactional emails, webhook deliveries, billing reconciliation, audit log shipping). Must complete eventually. Needs at-least-once delivery with idempotency. A job runner plus the outbox pattern handles this well. This is the classical “edge async” most architectures assume.
  • Long-running compute, user-perceived (LLM calls, video processing, image generation, heavy report generation). The user is waiting. The work is too long for an HTTP request (5–60 seconds, sometimes minutes). It can fail and need to be retried. The result needs to be persisted reliably and the user needs to be told when it’s done. This is the category LLM-driven apps live in for their primary value-creating interaction, even if it’s not the most-frequent interaction by count.
  • Multi-step durable workflows (loan applications, KYC pipelines, multi-stage LLM enrichment, agentic loops, scheduled batch processing). Long-running, multiple distinct steps, each fallible, each potentially needing compensation. Spans minutes to days. Needs durable state at each step.
  • Streaming / continuous (real-time feeds, change-data-capture, websocket fanout). A different beast entirely; usually a separate part of the system rather than the core.

The architectural question isn’t “do you have async work”; almost every app does. It’s “which categories appear in your app, and what mix of mechanisms do you need to support them cleanly?”

The LLM-call shape, specifically

A single user-perceived LLM call has a very specific shape that constrains your architecture:

  • Duration: typically 2–30 seconds, sometimes longer with reasoning models or large contexts. Always too long for an HTTP request handler that you want to be reliable on mobile networks.
  • Cannot run inside a DB transaction. Holding a Postgres connection open for 15 seconds while waiting on an LLM is a connection-pool detonation under any non-trivial load. The work splits into: (open tx → read what’s needed → close tx) → (LLM call, off-tx) → (open tx → validate + persist → close tx). This is non-negotiable.
  • Failure modes are part of the UX. What does the UI show during the 15-second wait? What if the LLM call times out? What if validation fails on the response? What if the model returns malformed JSON? Each of these is a state the user can be in.
  • Idempotency is mandatory. A retry must not produce duplicate domain mutations. Either the worker writes idempotently (upsert by the natural id of the work), or the orchestrator handles dedup, or both.
  • The principal must travel. Whoever’s work it is, the worker needs that context to set RLS, log audit trails, attribute metering, and authorise downstream calls. The job payload carries the principal; the consumer re-establishes context before doing anything.
  • Cost is real and per-call. Unlike “free” CPU work, every LLM call costs cents to dollars. This shapes architecture too: you want caching at clear boundaries, deduplication of identical requests, observability per-call, and the ability to swap models or providers without restructuring.

A single LLM call is one async edge. A pipeline of them is a workflow. Transcribe → extract → enrich → embed → notify is five steps, each long-running, each fallible. Once you have more than two such steps in series, you’ve moved from “I have a job runner with idempotency” to “I want a workflow orchestrator.”

How async-ness interacts with each flavour

FlavourAsync fitNotes
0: DB-as-APIPoorPostgREST/Supabase have no native long-running operation concept. You bolt on a queue and a worker, at which point you’ve effectively left the flavour for a hybrid. Some platforms (Supabase Edge Functions + queues) paper over this for simple cases.
1: SQL-firstPoorAsync lives outside the DB-as-truth model. Triggers and listen/notify can dispatch work but don’t help with running it.
2: Transactional scriptWorkable with disciplineAdd a job runner. Each long-running task is a job. Idempotency, retries, principal-passing all need explicit handling. Works fine until your async work becomes the dominant interaction.
3: Modular monolith, shared txStrongDiscrete async edges via job runner. Synchronous bulk; async at clear boundaries. The mental model stays clean: the request handler is short; the job is the long-running work. This is the right default for most LLM-driven apps in the 0–3 step pipeline range.
4: Modular monolith, choreographedStrong but more complexAsync is the default. Outbox + relay. Good when async edges are many and naturally shaped as events. Overkill if you only have a couple of long-running pipelines.
5: Pure hexagonal, orchestratedBest fit when async-heavyWorkflow runtimes (DBOS, Temporal, Restate) are built for this. Long-running multi-step workflows with durable state, retries, compensation, and observability as first-class concerns. The dividend is large when your app is fundamentally workflow-shaped; the cost is real if it isn’t.
6: Pure hexagonal, choreographedStrong but heavyMicroservices communicating via events handle async naturally. Operational cost remains the dominant concern.
7: Event Sourcing / CQRSAsync by designEvery read is async-relative-to-write (projections lag). Adds another async axis (command → event → projection) that’s separate from the LLM-pipeline axis.

When a job runner is enough vs. when you want a workflow orchestrator

The decision between “Flavour 3 + job runner” and “Flavour 5 + workflow orchestrator” hinges on the shape of your async work. Concrete heuristics:

A job runner is enough when:

  • Your long-running work is mostly single-step (one LLM call per input; one image generation per upload; one report per request).
  • You can express idempotency at the boundary (job key = natural id of the work; results upsert).
  • Failures are handled by simple retry with backoff; compensation is trivial or unnecessary.
  • The pipeline doesn’t need to reason about its own state; it either succeeds or fails atomically.
  • You have one to three async pipelines in the application.

A workflow orchestrator earns its keep when:

  • Pipelines have multiple distinct steps, each long-running and fallible (e.g., transcribe → extract → embed → notify, with each step taking seconds and any one able to fail).
  • Compensation logic is real (if step 4 fails, you need to undo or partially undo steps 1–3).
  • Workflows can pause for external input (waiting for human approval, waiting for a webhook, waiting for a clock).
  • You need durable state at each step that’s queryable for support, debugging, and UX (showing progress to the user).
  • You have many distinct workflow types and want them to share a runtime rather than each reinventing retry/idempotency/state.

A common middle ground: stay in Flavour 3, but introduce a workflow orchestrator (DBOS sits naturally on Postgres for this) for the specific multi-step pipelines that need it. The rest of the app (request handlers, simple jobs, CRUD) stays Flavour 3. This is a single-axis movement (you’ve adopted orchestration for one set of flows) rather than a full move to Flavour 5. It preserves the simplicity of Flavour 3 for everything that doesn’t need workflows.

The shape of complexity in LLM apps

For LLM-driven products specifically, it’s worth recognising where complexity tends to concentrate. The application is still a mosaic of sync CRUD and async work, but the parts that are hard, expensive to operate, and most likely to fail in surprising ways live disproportionately in the worker layer:

  • Sync CRUD layers (auth, settings, billing, simple domain reads) are well-understood. Standard patterns. Low operational risk per-request. Most engineering teams handle this well by default.
  • The LLM-driven worker layer is where the uncertainty lives. Long-running calls. Variable cost. Variable latency. Failure modes that shift as models update. Idempotency, principal-passing, and tenancy-isolation issues that don’t exist in sync paths.

The practical implication is not “the worker is the application”; it’s most user interactions still hit the sync path, by count. The implication is that engineering attention should be allocated by risk and complexity, not by request volume. Concretely:

  • The worker code path needs the same discipline as the HTTP path: tenancy context, transaction boundaries, structured logging, error handling, observability. It is not a side-channel.
  • Tests need to exercise worker paths as first-class, with the same coverage standards as HTTP paths.
  • Rate limits, circuit breakers, and back-pressure for LLM/external work mostly belong in the worker (or workflow orchestrator), not the HTTP handler.
  • Deployment / autoscaling for the worker tier is driven by LLM throughput and queue depth, not request volume.
  • Per-call cost observability matters in the worker in a way it doesn’t matter for sync CRUD.

The architectures that fail this class of app treat the HTTP layer as primary and the worker layer as a quiet sidecar. They get the priority of attention wrong, then accumulate quiet bugs in the part of the system that matters most.

What async-ness means for the choice of flavour

  • If your app is a CRUD app with a few async edges (emails, webhooks, occasional long jobs) → Flavour 3 + job runner. Don’t overthink.
  • If your app is an LLM product with single-step pipelines (input → extract → save) plus regular CRUD around it → Flavour 3 + job runner. Discipline around idempotency and principal-passing matters more than architectural sophistication.
  • If your app is an LLM product with multi-step pipelines (transcribe → extract → enrich → notify, fallibility per step, user-visible progress) → Flavour 3 with a workflow orchestrator for those specific pipelines. Don’t full-convert to Flavour 5; just add the orchestration runtime where it earns its keep.
  • If your app has agentic flows (LLM loops with tool calls, agent delegation, dynamic step graphs) → Flavour 3 with a workflow orchestrator becomes close to mandatory. See the agentic patterns section below.
  • If your app is fundamentally workflow-shaped (loan applications, KYC, complex booking) → Flavour 5 in earnest. The orchestration runtime becomes a dominant infrastructure choice.
  • If your app is async-by-nature across multiple bounded contexts deploying independently → Flavour 6, but only if the team and scale justify it.

The rule of thumb: the architecture must host every async profile your app needs simultaneously, and let each interaction pick the right one. Most apps need three or four simultaneously: sync CRUD for chassis, reliable async for edges, long-running async for LLM work, and possibly orchestrated workflows for multi-step or agentic flows. Flavour 3 + job runner + selective orchestration handles this combination cleanly. Don’t pick one async pattern and force everything through it.

Streaming as an architectural concern

Async-ness is about when work happens: request now, result later. Streaming is about how output is delivered: incrementally, over a connection that stays open. They are related but distinct, and modern LLM products often need both.

A request returning a single result thirty seconds later is async. A request returning tokens one at a time over five seconds is streaming. A request that triggers a long-running job and then streams progress updates as it runs is both. The architectural mechanisms are different, and choosing them deliberately matters.

Categories of streaming

  • Token-by-token LLM output. Increasingly the default UX for chat and even structured generation. The user sees the model “thinking” rather than waiting for the full response. Latency-critical and bandwidth-cheap.
  • Server-Sent Events (SSE) for progress / live updates. A long-running job pushes progress updates to a connected client. Used for “currently processing… currently extracting… done” UX. Simpler than WebSockets; works over plain HTTP.
  • WebSockets for bidirectional realtime. Chat-like UIs, live collaboration, presence indicators, anything where the client also pushes to the server in real-time. More infrastructure, more failure modes, more value when bidirectional is actually needed.
  • Real-time collaboration (CRDTs / operational transform). Multi-user editing of shared documents. A specialised infrastructure layer entirely; usually a third-party service (Liveblocks, Yjs-based platforms, etc.) unless the product is a collaboration tool itself.
  • Server-side stream processing. Change-data-capture (Debezium), event streaming platforms (Kafka, Redpanda, NATS Jetstream), real-time analytics. A different beast: these are infrastructure between services, not user-facing streams.
  • Push notifications. Mobile push, web push, email digests of accumulated events. A delivery mechanism rather than a streaming protocol, but architecturally adjacent.

The first three are user-facing and the most common decisions for product applications. The fourth is specialised. The fifth and sixth are different architectural concerns that just happen to share the word “streaming.”

Why streaming has its own architectural shape

Streaming differs from job-based async in ways that matter:

  • Long-lived connections, not throughput. Scaling is measured in concurrent connections (10K connections holding open SSE streams), not requests per second. Connection limits, load balancer config, and process model all change.
  • Different failure modes. Connection drops, reconnections, backfill of missed events, idempotency of incremental updates. None of these exist for a job that runs once and writes its result to a row.
  • Transactionality is awkward. Streaming partial output before the full operation has committed risks the user seeing data that gets rolled back. Streaming after commit means the user waits longer. The right answer is usually “stream observable progress, then send the final committed result as the closing event.”
  • State is partly on the client. A streaming UI accumulates state from the events it receives. If the connection drops mid-stream, the client has partial state that needs reconciling, usually by re-fetching from the canonical store.
  • Compounds with LLM work. Streaming an LLM’s output is increasingly an expected UX. The token-stream from the model has to flow through your worker, possibly through an orchestrator, and out to the client without buffering.

Streaming + LLM specifically

Streaming LLM output to the client touches multiple layers and is worth designing deliberately:

  1. The model returns a stream. Most LLM providers offer streaming APIs that yield tokens as they’re generated.
  2. The worker (or HTTP handler, depending on architecture) consumes the stream. It can either pass tokens through to the client immediately, or accumulate and process them.
  3. The transport to the client. SSE is usually correct: simple, works over HTTP, supports reconnection. WebSockets are overkill for one-way streaming. HTTP/2 server push is mostly dead in practice.
  4. The persistence story. Tokens stream to the client, but the database probably wants the full result. Two patterns:
    • Stream then persist: stream all tokens, then write the final result in one transaction at the end. The client sees the result before it’s stored, which is fine for ephemeral chat but risky for systems where the canonical store matters.
    • Persist incrementally: write rows as content accumulates (each “complete sentence” or “complete tool call”). Higher write volume; survives crashes mid-generation; closer to the “audit log of what was generated” pattern.

For most products, “stream then persist” is fine. For systems where the streamed content drives downstream actions (an agent loop where each token might be part of a tool call), incremental persistence with a workflow orchestrator is the cleaner answer.

How streaming interacts with each flavour

  • Flavours 0, 1, 2: No native streaming model. You’d add SSE/WebSocket as a sidecar or a custom endpoint. Workable for simple cases.
  • Flavour 3: Standard pattern is a streaming endpoint that’s a separate HTTP handler from the synchronous CRUD ones. The handler holds the connection open while a worker (or in-process generator) produces output. The shared-tx model doesn’t apply to streaming endpoints; they’re naturally outside transactional flows.
  • Flavour 4: Workers can publish progress events to a separate channel (Redis pub/sub, Postgres LISTEN/NOTIFY, dedicated stream service). The streaming endpoint subscribes and forwards.
  • Flavour 5 (orchestrated): Workflow orchestrators usually have hooks for emitting progress events that can be surfaced to clients. DBOS and Temporal both support this in different ways.
  • Flavour 6 (microservices): Streaming usually goes through dedicated infrastructure (a streaming gateway, a websocket service). Cross-service coordination via the event bus.
  • Flavour 7 (ES/CQRS): Naturally streaming-friendly: projections can produce continuous output, and the event log is itself a stream. Subscriptions to the event log become a primitive.

Command-then-poll: the workhorse pattern

Before reaching for streaming, recognise the pattern that handles most async UX cleanly without holding connections open: command-then-poll. The shape:

  1. Client POSTs a command. Server validates, enqueues the work, returns 202 with a job/workflow ID.
  2. Client polls a status endpoint (GET /jobs/{id} or similar) every 1–2 seconds.
  3. Server returns the current status (pending, running, possibly with current step, done, failed) plus the final result if available.
  4. Client stops polling when it sees done or failed.

This pattern is the right default for most long-running async work. It’s simpler than streaming, requires no special transport, works over standard HTTP, survives flaky networks (the client just retries the next poll), and composes naturally with workflow orchestrators that expose state as a queryable resource. DBOS specifically lets you query a workflow’s state directly: you can see which step it’s on, what intermediate results are available, and whether it’s still running. Temporal does similarly. pg-boss has a more limited introspection surface but you can query the job table directly to drive the same UX.

Why command-then-poll is usually right for v1:

  • Backwards-compatible with any client. No special browser features required, no reconnection logic needed, no infrastructure for long-lived connections.
  • Trivially horizontal. A poll request hits any server, queries the job state, returns. No connection affinity needed.
  • Composes with caching. Status responses can have short cache headers; clients get cheap “still pending” responses.
  • Plays well with mobile and flaky networks. If the poll fails, the client just polls again. No reconnection state to manage.

When to upgrade to streaming:

  • The polling endpoint generates significant database load (very high-volume products at scale).
  • The UX needs sub-second progress updates (token-by-token output, live transcription).
  • You’re showing streaming content the user wants to see as it’s generated (LLM responses, video transcoding progress with frame-level updates).

For most products in most situations, polling at 1–2 second intervals is the right answer. The temptation to reach for streaming because it sounds more sophisticated is usually wrong. SSE is a great upgrade path when you need it; don’t pre-emptively pay for it.

What this means in practice

  • Don’t conflate streaming with async. You can have one without the other. A long-running job that returns a single result at the end is async, not streaming. A streaming chat response that completes in 3 seconds is streaming, not async. A long-running job that streams progress is both.
  • Default to command-then-poll for async UX. Issue the command, return a job ID, poll for status. SSE is an upgrade you adopt when polling isn’t enough, not a default to start with.
  • Use SSE before WebSockets. Simpler, works over HTTP, supports automatic reconnection, no extra infrastructure. Reach for WebSockets only when you need bidirectional or sub-second presence.
  • Plan for reconnection. Any streaming UX needs to handle “connection dropped, client reconnects” gracefully. Either resume from where you left off (keep stream state server-side) or restart and let the client reconcile.
  • Stream observable progress, persist at commit. The committed state is the source of truth; streaming events are a UX optimisation. Don’t make the stream the canonical record.
  • Streaming is mostly UX, not architecture. It mostly sits at the edges. The architectural decisions (flavour, async pattern, tenancy) usually don’t change because of streaming requirements; you add a streaming-capable transport at the boundary.
  • Tenancy still applies. A streaming connection is still scoped to a tenant. Set the context once at connection establishment and hold it for the lifetime of the connection. The connection is an authenticated session, not a stateless request, but the security model is the same.

Concurrent updates and conflict resolution

Whenever two writes can target the same data, the system has to answer: what happens? The naive answer (“last write wins”) works fine until it doesn’t, and the failure mode (silent data loss, often invisible to the user) is one of the worst kinds. This deserves explicit treatment because the choice of strategy interacts with flavour, with the multi-device-single-user case, and with the LLM-extraction-vs-manual-correction case that’s specific to LLM-driven products.

When this is a problem (and when it isn’t)

For pure single-user, single-device CRUD, conflicts are rare and usually benign. The user clicked “save” twice; the second save wins. Fine.

The cases where conflict resolution becomes architectural:

  • Multi-user collaboration. Two users edit the same record at the same time. Without a strategy, one user’s work is silently overwritten.
  • Multi-device single-user. Common in modern products. The same user has the iPhone app open and the desktop browser open; they edit on both. Same problem class as multi-user, harder to dismiss as “the user’s fault.”
  • LLM-extraction races. The user manually corrects an extracted record; meanwhile a re-extraction job runs against the original input and produces new extracted output. Without a strategy, the manual correction is silently overwritten by the re-extraction.
  • Background jobs racing with user actions. A daily refresh job updates a record; the user is currently editing it. Same problem.
  • Distributed system writes. In Flavour 6 / microservices, multiple services may produce updates that all target a shared read model. Always conflict-prone.

If your app has any of these (and most non-trivial apps have at least one), you need an explicit strategy.

The strategies, ranked by complexity

Last-write-wins (LWW). No version checking. Whoever writes last wins. The other write is silently lost.

  • When acceptable: low-stakes data where users won’t notice loss, or where conflicts are very rare.
  • When dangerous: anything users have invested attention in. “I typed three paragraphs and they vanished” is a product-killing UX.

Optimistic concurrency control (OCC). Every record has a version column (or an etag, or an updated_at timestamp used as a token). Reads return the version; writes include the expected version; the server only accepts the write if the version matches. If it doesn’t, the client gets a conflict response and decides how to handle it (usually: refresh and ask the user to redo).

  • When right: the default for almost every CRUD app. Cheap, well-understood, supported natively by most ORMs.
  • Limitations: doesn’t merge. The user’s experience on conflict is “your changes were rejected, please redo.” For most workflows, that’s fine.

Pessimistic locking. SELECT ... FOR UPDATE holds a row-level lock; other transactions block until you release it. Or application-level locks (Redis-based, advisory locks).

  • When right: hot rows that are written frequently and contention is high. Inventory counters, queue heads, balance updates.
  • When wrong: anywhere with user-driven workflows. A user holds a lock by clicking edit, then goes to lunch; everyone else is blocked.

Domain-specific merge. Conflict resolution that knows the data type. For a counter, sum the deltas instead of taking one. For a set, union both writes. For a tag list, union or intersect by intent. For independent fields of a record, merge per-field rather than per-record.

  • When right: counters, sets, simple aggregates. Cheap to implement; eliminates many trivial conflicts.
  • When wrong: anywhere semantics are ambiguous (what does it mean to merge two prose edits?).

CRDTs (Conflict-free Replicated Data Types). Mathematical structures that converge regardless of update order. Used by Yjs, Automerge, Liveblocks-style platforms.

  • When right: real-time multi-user collaboration on rich content (documents, whiteboards, structured data with nested edits). Multi-device offline-first apps.
  • When wrong: most CRUD apps. The mathematical structure is restrictive, the libraries have a learning curve, and you don’t need it if your conflict rate is low and OCC works.

Operational Transformation (OT). The technique behind Google Docs (originally). Server-mediated; transforms operations relative to each other so they apply correctly. Older than CRDTs and harder to get right.

  • When right: essentially never for new systems. CRDTs supersede OT for most use cases.

Manual merge / branching. Git-style. Both versions are preserved; the user resolves the conflict.

  • When right: high-stakes content where neither side should be silently lost (legal documents, code, anything where audit matters more than UX).
  • When wrong: high-frequency conflicts. Users will not tolerate frequent merge prompts.

How conflict resolution interacts with flavour

FlavourTypical strategy
0: DB-as-APIOCC at the row level (timestamps or version columns), enforced via constraints or triggers.
1: SQL-firstOCC; ORMs (ActiveRecord etc.) often build this in.
2: Transactional scriptOCC; explicit version checks in handlers, returning 409 on mismatch.
3: Modular monolithOCC as default, domain-specific merge where it pays off, pessimistic locks for hot rows. The shared transaction simplifies the logic: you can read-modify-write atomically within one tx.
4: Modular monolith, choreographedOCC + sagas for cross-module conflicts. Eventual consistency means more conflicts than Flavour 3.
5: Pure hex, orchestratedOCC at the storage adapter; the orchestrator can also serialise concurrent workflow steps targeting the same resource.
6: MicroservicesOCC + sagas + domain-specific merge. Services may need conflict resolution at the read-model level (multiple services produce updates to the same view).
7: Event SourcingConflicts mostly disappear at the write side (events are appends; appends don’t conflict). They reappear at the projection side, where two events targeting the same aggregate need a deterministic projection rule.

The multi-device single-user case

Worth pulling out because it’s the case most teams underestimate. Modern users routinely have:

  • iPhone app and desktop browser open simultaneously.
  • Phone in pocket while editing on laptop.
  • Old session on a forgotten tab from yesterday.

Treating this as “single-user, no conflicts possible” is wrong and produces silent-data-loss bugs that are hard to reproduce and devastating when they’re discovered. Two patterns worth adopting from day one even in nominally single-user products:

  • Always include OCC tokens on writes. A version column or updated_at on every mutable record. Writes that include a stale token are rejected with 409. Cost: one column. Benefit: no silent overwrites between the user’s iPhone and laptop.
  • Show “this was modified elsewhere” UX gracefully. When the client gets a 409, refresh the underlying state and show the user what changed before they retry. Bad UX: “Your changes were lost.” Good UX: “Someone (you?) updated this on another device. Here’s what changed; do you want to keep your version, theirs, or merge?”

For most products, this is enough. Real-time collaboration is a different beast and isn’t needed until the product actually involves simultaneous edit (which most don’t, even when the marketing says “collaborative”).

LLM-extraction vs manual-correction races

A specific case for LLM-driven products: the system extracts structured data from input, the user manually corrects part of it, and a re-extraction (triggered by a prompt change, ontology update, or user request) produces new extracted output. What happens to the manual correction?

The naive answer (“the latest extraction wins”) silently destroys user effort, which is a particularly bad UX in a product whose value proposition involves trusting the LLM’s output.

Two patterns:

  • manually_edited_at / manual_override flags. Each field (or each record, depending on granularity) has metadata indicating it was manually edited. Re-extraction skips fields with active manual overrides, or surfaces the conflict for explicit resolution.
  • Layered storage: extraction layer + correction layer. Two distinct stores. The extraction layer is overwritten freely by re-extractions. The correction layer holds explicit user edits. The displayed value is correction-layer-if-present, else extraction-layer. Cleaner conceptually but more code.

For most LLM-driven products, the flag pattern is enough. The architectural insight is just: manual corrections must not be silently overwritten by automated processes. This is a recurring source of trust-destroying bugs in extraction products.

What this means in practice

  • Add OCC tokens to every mutable record from day one. A version integer or an updated_at timestamp used as a token. Cheap; prevents an entire class of bugs.
  • Don’t reach for CRDTs unless you have real-time collaboration. They’re powerful but restrictive; OCC works for 95% of apps.
  • Treat “modified elsewhere” as a UX problem, not just a backend problem. The conflict response needs a graceful user-facing flow.
  • Protect manual corrections from automated overwrites. A flag, a separate layer, or explicit conflict resolution, but not silent overwriting.
  • For real-time collaboration, prefer a third-party platform (Liveblocks, Yjs-based) over rolling your own. Done well, it’s a multi-month engineering project; done poorly, it’s a recurring data-loss bug.
  • Use pessimistic locking sparingly. It’s the right tool for hot-row scenarios but the wrong tool for user-driven workflows.

Distributed transactions: protocols that exist, and why you usually don’t want them

A natural question arises in pure-hexagonal architectures (Flavour 5 and 6) and any system that spans multiple storage backends: if the database is “just an adapter” and the domain layer doesn’t know about transactions, how do you get atomic updates across aggregates that might live in different storage systems? Do you have to invent your own protocol? Doesn’t that mean reinventing the semantics of the storage layer?

The short answer: at the application level, yes, you’d mostly be reinventing storage semantics. That’s why the modern answer is “don’t try.” This deserves explicit treatment because the alternatives (and their trade-offs) aren’t obvious.

The protocols that actually exist

These are real, named protocols you’ll encounter in distributed systems literature:

  • Two-Phase Commit (2PC). The classical protocol. A coordinator asks all participants to “prepare” (write but don’t commit, hold locks). If all say yes, the coordinator says “commit.” If any says no, the coordinator says “abort.” Postgres supports it via PREPARE TRANSACTION; many message brokers and databases implement it via the XA standard. Famously brittle: if the coordinator crashes between prepare and commit, participants are stuck holding locks until it recovers. Doesn’t scale: the coordinator is a synchronisation bottleneck and a single point of failure. Largely considered an anti-pattern for new systems at this point.
  • Three-Phase Commit (3PC). Attempts to fix 2PC’s blocking problem by adding a pre-commit phase. Still has failure modes (network partitions). Rarely used in practice.
  • Paxos / Raft / Multi-Paxos. Consensus protocols. Not really application-level distributed transactions; they coordinate distributed state across replicas of a single logical system. You don’t “use” Paxos at the application level; you use a database that uses Paxos internally (etcd, CockroachDB, Spanner).
  • Spanner-style TrueTime / hybrid logical clocks. Google’s Spanner uses synchronised atomic clocks + GPS to provide externally consistent transactions across geo-distributed databases. CockroachDB replicates the approach using hybrid logical clocks instead of atomic clocks. This is a database-level solution: you get distributed ACID because the database provides it; nothing changes at the application level except your BEGIN/COMMIT now spans rows that might be in different regions.
  • Sagas. Not actually a transaction protocol. A pattern where you sequence local transactions with compensating actions for rollback. If step 3 of 5 fails, you run compensations for steps 2 and 1. Compensation logic is application-defined. Eventual consistency only. Originally from a 1987 database paper; adopted by the microservices community as the standard answer to “how do I do cross-service transactions?”
  • Try-Confirm-Cancel (TCC). A saga variant where the “try” phase reserves resources (similar to 2PC’s prepare) and then “confirm” or “cancel” runs. Common in financial systems. Application-level protocol; the framework helps coordinate but the semantics are application-defined.
  • Outbox + idempotent consumers. Not a transaction protocol but the dominant pragmatic answer. Write to your local DB and an outbox table in one local transaction. A relay reads the outbox and publishes events. Consumers are idempotent. You get eventual consistency without distributed transactions, at the cost of windows where one operation is visible and the other isn’t yet.

Why 2PC fell out of favour at the application level

You can use XA across Postgres + a JMS broker + another system. The protocol exists; tools support it. But almost no one builds new systems this way, for reasons that are worth being explicit about:

  • Locks held across the prepare/commit window. Participants must hold their locks from “prepare” until “commit” arrives. If the coordinator is slow or unreachable, every participant is blocked. Lock contention becomes a system-wide problem rather than a per-database problem.
  • Coordinator failure is catastrophic. If the coordinator crashes after sending “prepare” but before sending “commit,” participants are stuck. Recovery requires the coordinator to come back, look at its log, and finish what it started. In the meantime, throughput collapses.
  • Performance cost compounds with participants. Latency is bounded below by the slowest participant × at least 2 round trips. Each participant added makes every transaction slower.
  • It doesn’t compose with HTTP. You can’t sensibly do 2PC across REST services. The protocol assumes long-lived participant connections to the coordinator, which doesn’t match how modern services communicate.
  • Operational complexity. Coordinator recovery logs, in-doubt transaction resolution, monitoring of prepared-but-not-committed states: these are new operational concerns that come with the protocol.

In modern systems, application-level 2PC has largely been displaced by sagas (for cross-service work) or by databases that handle distribution internally (for cross-region work).

The “you’d reinvent SQL semantics” observation

The deeper point the question raises is valid. If you try to build cross-store atomicity at the application level, you’re essentially building a new mini-database, with its own concurrency control, recovery, isolation levels, and durability guarantees. And you’ll do it worse than a real database does. Databases have spent decades getting transaction semantics right; the WAL, MVCC, lock managers, deadlock detection, and crash recovery in Postgres represent enormous engineering effort. Rebuilding any of that at the application layer is a tarpit.

This is why the modern answer to “how do I get atomicity across stores” is “don’t have atomicity across stores.” Restructure the problem so atomicity is required only within a store that provides it.

The pragmatic answers in pure-hex world

Concretely, when working in Flavour 5 or 6:

  1. Co-locate aggregates that need atomic updates within the same adapter. The bounded-context boundary should match the storage boundary. If two things must be updated atomically, they belong in the same context, served by the same adapter, sharing the same transaction-capable store. The DDD rule “one aggregate per transaction” is essentially this insight: stop trying to be atomic across boundaries.
  2. Use eventual consistency between aggregates that don’t need atomic updates. Domain events flow between aggregates; consumers are idempotent; compensations exist for failure paths. This is the classical answer and almost always the right one.
  3. For cross-context work, use a workflow orchestrator to manage the saga. DBOS, Temporal, Restate. The orchestrator handles durable steps and compensation logic: you don’t invent the orchestration protocol; you write the saga steps and the orchestrator handles durability, retries, and resumability.
  4. If you need distributed ACID, use a database that provides it. CockroachDB, Spanner, FoundationDB. Let the storage adapter handle the distributed transaction. This is a deliberate departure from Postgres-for-everything, justified only when geo-distribution or scale makes a single Postgres insufficient. Note that for the audience of this post, this is essentially never the right starting point.
  5. 2PC at the application level: essentially never. Even when you can do it, the costs almost always outweigh the benefits. The exceptions are narrow (specific compliance requirements, integration with legacy XA-aware systems) and unlikely to apply to a solo founder’s product.

How this relates to the flavours

  • Flavours 0–3 (DB-as-API, SQL-first, Transactional script, Modular monolith with shared tx): No distributed transaction problem. You have one Postgres; atomicity across aggregates is just BEGIN ... COMMIT. This is why this post’s default recommendation (Flavour 3) sidesteps the entire question.
  • Flavour 4 (Choreographed monolith): Eventual consistency between modules; sagas for compensation. The single Postgres is still the substrate, but cross-module updates aren’t atomic; that’s the whole point of moving to choreography.
  • Flavour 5 (Hexagonal, orchestrated): Workflow orchestrator manages the saga. Atomicity at each orchestrator step (within a single adapter). Compensation across steps. No 2PC.
  • Flavour 6 (Hexagonal, choreographed / microservices): Sagas via event bus; compensations everywhere; eventual consistency baked in. Distributed ACID requires either a database that provides it (and matching all your services’ adapters to it) or accepting the eventual-consistency story.
  • Flavour 7 (Event Sourcing): Events are atomic-per-aggregate; cross-aggregate transactions are by definition eventually consistent (projections lag, sagas mediate).

What this means in practice

  • Stay in a flavour with a shared transaction (3 or below) until you have a concrete reason to leave it. Most distributed-transaction problems are self-inflicted by adopting a distributed architecture before it was needed.
  • When you do split (Flavour 4+), design for eventual consistency from the start. Don’t try to bolt distributed atomicity back on. The architecture is making a different trade-off; honour it.
  • Use a workflow orchestrator for cross-context business processes. DBOS, Temporal, or equivalent. The orchestrator becomes your “distributed transaction”: durable, resumable, compensable, without the 2PC pathologies.
  • If you find yourself wanting application-level 2PC, treat that as a signal that your aggregate boundaries are wrong. Either the things that need atomicity should be merged into one aggregate, or one of them shouldn’t need to be atomic with the other.

Agentic patterns and the LLM-call loop

Architectural writing about LLM systems mostly assumes a single-call shape: prompt in, structured output out, persist the result. As products get more capable, this shape breaks down. The system makes one LLM call, the model says “I want to call function X,” the runtime executes X, the result is fed back, the model continues, possibly making another tool call, and so on until the model produces a final answer.

This is qualitatively different from a single async edge, and it changes the architecture in specific ways.

The patterns to recognise

  • Tool-calling loops. A single LLM “task” is now a recursive async pipeline of unknown length. Each iteration is a long-running call. Any iteration can fail. The whole thing can run for minutes and consume dollars in API costs.
  • Programmatic handoff. Deterministic code makes a deliberate decision to delegate a sub-task to an LLM agent (or vice versa), with a defined input contract and output contract. The boundary is intentional and bounded. This is the cleanest agentic pattern from an architecture standpoint, because the handoff has a clear interface.
  • Agent delegation. One agent (an “orchestrator agent”) decomposes a task and spawns sub-agents (a researcher, a writer, a critic), each with their own context and tool access. The structure may be hierarchical (parent-child) or graph-shaped (agents reading each other’s outputs).
  • Multi-agent coordination. Agents communicate with each other through some mechanism: shared state, structured handoff protocols, message buses. State is partial, distributed, and changing. The oldest version of the shared-state model is the blackboard architecture: a shared structured workspace that specialised agents read from and write to, while a control component decides which agent acts next. It maps cleanly onto a database table or workflow state that each agent updates in turn, and it makes coordination legible: every contribution is recorded, and every decision about who acts next is explicit rather than buried in message-passing between agents.
  • Long-lived agents. An agent persists state across user sessions. It runs autonomously between user touches. It accumulates context and may have its own goals or schedules.
  • Shared machinery for humans and agents. Nothing requires the coordination primitives to be agent-only. A human reviewer and an LLM agent can read from and write to the same blackboard, claim work from the same queue, and move a task through the same workflow steps. Treating a person as one more contributor to the shared state buys consistency (one set of interfaces to build and reason about) and rigour (the same audit trail, conflict resolution, and principal checks apply to both). It also makes human-in-the-loop a first-class case rather than a bolt-on: an approval step is the same primitive as a tool call, backed by a person instead of a model.

Most product applications adopting agentic patterns today land somewhere in the first three. The fourth and fifth are still rare and very experimental.

Why agentic systems push hard toward orchestration

A single LLM call fits cleanly into “Flavour 3 + job runner.” A tool-calling loop or a delegation graph does not, for several reasons:

  • The structure isn’t known up front. A workflow orchestrator that supports dynamic step extension (each tool call becomes a step at runtime) handles this naturally. A job runner with fixed pipelines does not; you’d end up reinventing dynamic workflow shape.
  • Partial progress matters for both UX and cost. An agent has done 15 tool calls and the 16th fails. You don’t want the retry to redo the first 15; they were expensive, and possibly not idempotent on the external side. Durable per-step state, which is what an orchestrator gives you, is the only clean answer.
  • Resumability matters. Agents can run for minutes. The worker process can restart for any reason. Restarting from scratch loses real money and produces a worse user experience. Orchestrators persist step results and resume.
  • Pausing matters. Agentic flows often need to wait: for a webhook, a human approval, a scheduled time, an external job. Workflow orchestrators handle pauses as a primitive. Job runners do not.
  • Observability gets harder. “Why did this agent loop produce that output?” requires reconstructing every step, every tool call, every model decision. Orchestrators record this by design; in a job runner, you have to build the recording yourself.
  • Cost compounds. A bug in a job runner makes one extra LLM call. A bug in an agent loop can make hundreds. Per-workflow cost limits (the orchestrator can enforce) become a safety mechanism.

The practical recommendation: as soon as you adopt non-trivial agentic patterns, adopt a workflow orchestrator for those specific flows. This is still a single-axis movement (rest of the app stays Flavour 3), but it’s no longer optional in the way it was for single-step LLM pipelines.

The tenancy and security wrinkle

Agentic systems compound the principal-propagation problem from the tenancy section.

A single LLM call has one principal: the user who initiated it. Set the context once at the worker boundary; you’re done.

An agentic flow may have:

  • The user who initiated the request (the original principal).
  • Tool calls that touch other systems (which need to act on behalf of that principal).
  • Sub-agents spawned by the orchestrator agent (which need their own scoped principals; a sub-agent should not be able to do everything the parent could, in some designs).
  • External LLM provider calls (which need their own auth, separately from the user’s principal).

Every hop is an opportunity for principal confusion. Architectural discipline:

  • Persist the original principal in the workflow state. Every step reads it from there. No step “re-derives” it.
  • Tool calls that touch internal systems use the persisted principal. Don’t let agents elevate.
  • Tool calls that touch external systems use a separate service account or key, scoped to the action they’re performing, never the user’s credentials.
  • Sub-agents run with at most the same principal as their parent, ideally with a narrowed scope.
  • Adversarial tests for principal leakage become non-optional: a malicious or buggy agent should not be able to escalate or impersonate.

This isn’t theoretical. Agentic systems with shared infrastructure between tenants have leaked data through tool calls in real products. Bake the discipline in early. For the deep treatment of this whole class of failure (and the architecture that contains it), see The Confused Deputy Problem in Agent Systems.

Cost as a first-class architectural concern

For sync CRUD, request cost is roughly free and uniform. Per-LLM-call, it is neither.

Once you have agentic loops, a single user action can quietly burn $10–$100 in LLM costs if something goes wrong (or even when nothing goes wrong, for complex tasks). The architecture has to make cost visible and bounded:

  • Per-workflow cost limits, enforced at the orchestrator. Hard caps, not warnings.
  • Per-tenant cost tracking. You will need this for billing, support, and abuse prevention.
  • Per-step observability, with token counts, model versions, and costs on every record.
  • Circuit breakers on cost spikes: if a tenant’s daily spend exceeds 10× their average, halt and notify rather than continuing.

Job runners don’t naturally provide any of this. Workflow orchestrators don’t either, but they give you the place to add it cleanly. Either way, treating cost observability as “a problem for later” is a way to lose serious money to a single bad day.

When agentic patterns are not the right tool

Worth saying explicitly, because the temptation to make everything agentic is real:

  • For deterministic single-call extraction (input → structured output), use a single LLM call with structured output mode. Agentic framing adds cost and latency for no benefit.
  • For classification, summarisation, and most “input → transformed output” tasks, agentic patterns are over-engineered.
  • For tasks where reliability matters more than capability, deterministic pipelines beat agentic ones today.

Agentic patterns earn their keep when the task requires multi-step reasoning, dynamic tool use, or collaboration between specialised contexts. If a single well-prompted LLM call works, prefer it. If a deterministic pipeline of LLM calls works, prefer it. Only when the task structure is dynamic should you reach for agents.

Other runtime paradigms: virtual actors, message buses, and where they fit

Beyond workflow orchestrators, there are two other major runtime paradigms you’ll encounter in architecture discussions: virtual actors and message-bus / choreography frameworks. Both solve real problems, both overlap with what workflow orchestrators do, and both are usually wrong for the audience of this post. Worth understanding what they are, when they earn their keep, and why DBOS-class orchestrators are usually the better default for the solo-founder / Postgres-for-everything stance.

Virtual actors (Orleans, Akka Cluster, Dapr Actors, Proto.Actor)

The virtual actor model says: every stateful entity in your system (a user, an order, a chat room, an IoT device, a game session) is an actor. Actors have their own mailbox, are processed serially by the runtime (no concurrent updates within one actor), are automatically activated on first message and deactivated when idle, and have their state persisted by the runtime. The cluster distributes actors across machines transparently. “Virtual” means you don’t manage their lifecycle; the runtime activates and deactivates them as needed.

This is a programming model, not just a library. Your domain logic is structured as actor types with message handlers; the runtime handles placement, persistence, mailbox ordering, and failover.

The unit is the entity. Per-entity state isolation, per-entity message serialisation, per-entity location transparency are the dividends.

When virtual actors earn their keep:

  • High-concurrency per-entity workloads. Multiplayer games where each game session is an actor receiving many concurrent player inputs. IoT systems where each device is an actor with its own state. Real-time collaboration where each document is an actor. High-frequency trading where each instrument is an actor.
  • The actor model’s natural serialisation (one message at a time per actor) makes per-entity concurrent-update problems disappear; you get sequencing for free.

Costs for the audience of this post:

  • Running a cluster (multiple coordinated processes with leader election, partitioning, failover). New operational substrate; not aligned with Postgres-for-everything.
  • The programming model is invasive. Your domain isn’t “plain functions” anymore; it’s structured as actor message handlers, with implications for how you reason about transactions, dependencies, and testing.
  • Most B2C/SaaS products don’t have the per-entity concurrency profile that justifies the model. Your contact records aren’t receiving thousands of concurrent updates per second.

The audience verdict: virtual actors are a powerful answer to a specific question (high-concurrency per-entity state with strong serialisation requirements). If you’re not building games, IoT platforms, real-time collaborative tools, or trading systems, you probably don’t have the question.

Message-bus / choreography frameworks (MassTransit, NServiceBus, Mass Transit, Wolverine)

These are frameworks built on top of message brokers (RabbitMQ, Azure Service Bus, Kafka, etc.) that provide patterns for publish/subscribe, request/response over messaging, sagas (their term for orchestrated workflows), and state machines. MassTransit is the canonical example in .NET land; NServiceBus is its older sibling.

The unit is the message. Durable delivery, retry, dead-letter queues, and pub/sub topology are the dividends.

When message-bus frameworks earn their keep:

  • Enterprise integration scenarios where services from multiple teams or organisations need to communicate via durable, schema-versioned messages.
  • Environments where a message broker is already operationally established (common in .NET enterprise shops with Azure Service Bus or in JVM shops with Kafka).
  • Choreographed architectures where the bus is the integration backbone: Flavour 4 (choreographed monolith) or Flavour 6 (microservices) territory.

Costs for the audience of this post:

  • Requires a message broker: new operational substrate, not aligned with Postgres-for-everything.
  • The framework is shaped around brokers’ semantics (queues, topics, exchanges) which can feel heavyweight when your actual need is “I want to run this job later, reliably.”
  • Strongly tied to a language ecosystem (MassTransit/NServiceBus are .NET; Wolverine is .NET; analogous frameworks in JVM are similarly ecosystem-locked).

The audience verdict: if you’re not already operating a message broker for unrelated reasons, the outbox pattern + a Postgres-native job runner (pg-boss, River, Oban) gives you 80% of what these frameworks provide with 10% of the operational tax. Reach for MassTransit-class frameworks when the broker exists anyway and you want strong patterns on top of it.

How these relate to workflow orchestrators

DBOS, Temporal, virtual actors, and message-bus frameworks are overlapping abstractions over similar underlying mechanisms. You can model many things in each:

ParadigmNatural unitWhat it gives you naturallyWhat you build on top
Workflow orchestrator (DBOS, Temporal)The workflow: a multi-step business processDurable per-step state, replay, retries, compensation, pausingPer-entity state (model an entity as a long-lived workflow)
Virtual actor (Orleans, Akka)The entity: a stateful thing receiving messagesPer-entity state isolation, message serialisation, automatic placementMulti-step workflows (orchestrate sequences of actor calls)
Message-bus framework (MassTransit)The message: durable inter-service communicationReliable delivery, pub/sub, dead letters, broker patternsWorkflows (saga state machines on top of messages)

Each is the best tool for its natural unit. Trying to model a long-running KYC workflow as a virtual actor is awkward; trying to model a multiplayer game’s session state as a workflow is awkward; trying to model either as a message-bus saga state machine is the most awkward of all.

For the audience of this post (solo founder, small team, Postgres-for-everything, monolith-as-starting-point), workflow orchestrators are the right default for the long-running async work in your system. DBOS specifically (or Oban Pro for Elixir, or pg-boss/River with hand-rolled orchestration for simpler cases) keeps you within the Postgres operational footprint. Virtual actors and message-bus frameworks are valid for specific problems but bring substantial operational tax that’s hard to justify until the problem is unambiguously theirs.

When you’d actually reach for actors or message buses

  • Virtual actors: you’re building something where per-entity concurrent state is the dominant concern: multiplayer game, IoT platform with millions of devices, real-time collaborative editor at scale, high-frequency trading. The model’s overhead pays for itself because the alternative (managing per-entity locks, mailboxes, and placement by hand) is much worse.
  • Message-bus frameworks: you’re integrating with an enterprise environment that has an established broker; or you’re in a language ecosystem where the framework is the standard (e.g., .NET shops); or you’re at a scale where dedicated broker infrastructure is justified by throughput.
  • Otherwise: stay with workflow orchestrators (preferably Postgres-native) plus the outbox pattern. Simpler, fewer moving parts, easier to operate, easier to reason about.

The LLM-producer wrinkle

When an LLM produces structured data (extracting events from speech, classifying tickets, generating reports), the architecture math changes:

  • Your schema becomes a prompt. The LLM reads your types as a description of how reality is shaped and produces data to match. The schema is no longer falsifiable by the producer’s outputs; it’s self-confirming. This is schema capture.
  • The “make invalid states unrepresentable” instinct becomes dangerous. Tight unions tell the LLM what categories exist. The LLM produces instances of every category whether or not those categories carve reality at its joints.
  • JSONB + strict application-boundary types becomes the right default, not a compromise. The LLM produces JSON; the application validates it; Postgres stores it. Strict types live at the application boundary and earn their keep only when downstream code mechanically branches on them.
  • The “operability test” replaces the unrepresentability principle. A type case earns its keep if you can name three operations that branch on it and not on its parent. If you can’t, collapse it. In LLM systems with throwaway typed events (FRP variant of ES), the test applies to interpreters and read models: the prompt + reducer pair earns its keep if downstream operations branch on its output, not on the typed event itself (which is just an intermediate).

This pushes most LLM-driven systems toward Flavour 3 with JSONB payloads and strict application-boundary types, away from rigid relational schemas of LLM output, and away from classical ES where typed LLM-produced events are the canonical store.

The exception worth naming: the FRP-shaped variant of ES, where raw inputs (audio, image, document; the thing the LLM consumed) are canonical and typed events are throwaway derivations, is a good fit for LLM-driven products whose value involves iterating extraction interpretations over time. The classical-ES failure mode (canonical typed events the LLM produced) is avoided because the canonical events are observations, not interpretations. Re-derivation when prompts or schemas change is a normal operation. See Flavour 7’s treatment for the full shape.

Choosing your flavour

A blunt decision guide. Note that real applications usually answer “yes” to several of these at once; the decision is what combination of mechanisms you adopt, not a single flavour from the menu.

  1. Internal tool, want to ship in a weekend? Flavour 0 (Supabase or PostgREST). Stop reading.
  2. Solo or two-person CRUD app, will probably stay small? Flavour 2 (transactional script with composed handlers). Stop reading.
  3. You expect multiple bounded contexts (auth, billing, domain) and want them to grow without rotting? Flavour 3 as the foundation. Everything below is layered on top.
  4. Some flows are async (emails, webhooks, batch jobs)? Add a job runner + outbox pattern to Flavour 3 for those specific edges. Don’t convert everything.
  5. The producer of your data is an LLM? Use JSONB payloads and strict types at the application boundary. Resist tight relational schemas for LLM output.
  6. You have long-running user-perceived async work (single LLM calls, video/image processing)? Same job runner. Treat the worker with the same engineering discipline as the HTTP layer.
  7. You have multi-step LLM pipelines (transcribe → extract → enrich → notify, fallibility per step, user-visible progress)? Add a workflow orchestrator (DBOS, Temporal) for those specific pipelines. Don’t full-convert to Flavour 5; just add the orchestration runtime where it earns its keep.
  8. You have agentic flows (LLM tool-calling loops, agent delegation, dynamic step graphs)? Workflow orchestrator becomes close to mandatory for those flows. Per-workflow cost limits, per-step observability, and adversarial principal-leakage tests become non-optional.
  9. Your UX needs streaming (token-by-token LLM output, live progress updates, chat-style interactions)? Add SSE endpoints at the HTTP boundary; reach for WebSockets only if bidirectional realtime is required. Streaming is mostly an edge concern; it doesn’t change your underlying flavour.
  10. Your application is fundamentally workflow-shaped, workflows are the dominant interaction, not the exception? Flavour 5 in earnest. The orchestration runtime becomes a primary infrastructure choice.
  11. Your testing strategy depends on injecting fakes for slow/expensive I/O, or you want offline-capable local dev against real-software substitutes (Mailpit, MinIO, etc.)? Adopt ports/adapters for those specific ports. The fake adapter for tests and the local-dev adapter (often a containerised real service) both count as real second adapters and pay for the abstraction. You don’t need to go full Flavour 5 to do this.
  12. You have hard regulatory/temporal requirements (financial ledger, medical records, legal audit)? Flavour 7 for the parts that need it. Most of the system can stay Flavour 3.
  13. You have 50+ engineers and bounded contexts that deploy independently? Flavour 6 for the contexts that need it. Even then, only for those.
  14. Your product needs to run domain logic on multiple device classes (server + desktop + mobile + offline)? Flavour 5 starts to earn its abstraction tax. Otherwise it doesn’t.

The default for almost everyone reading this is Flavour 3 as the foundation, plus a job runner for async edges, plus JSONB payloads where structure is in flux, plus a workflow orchestrator selectively for multi-step or agentic flows, plus SSE for streaming UX where applicable, plus selective ports/adapters for the I/O dependencies that benefit from test fakes. Most apps need three or four of these mechanisms simultaneously; the architecture’s job is to host them all without forcing one pattern through the others.

What this means in practice

  • Don’t put business logic in triggers. Put data integrity in constraints.
  • Use foreign keys across module boundaries. Don’t use cascades across them.
  • Run one transaction per request, threaded explicitly through your modules.
  • Hold long-running calls (LLMs, external APIs) outside the transaction. Open tx → read → close tx → long call → open tx → write → close tx.
  • Treat the worker code path with the same discipline as the HTTP path. Tenancy context, transaction boundaries, structured logging, error handling, observability, tests.
  • Make every async job idempotent. Job key = the natural id of the work (record id, message id). Retries must not duplicate domain mutations.
  • The principal travels with the work. Job payloads carry the actor; consumers re-establish RLS/auth context before doing anything.
  • Let each interaction pick its own async profile. Don’t force sync work through a queue, and don’t force long work through a request handler.
  • Keep your sync CRUD (auth, billing, settings) sync. The fashionable temptation to make everything async makes the boring 80% of your app harder to reason about.
  • Use Postgres schemas to physically separate module tables.
  • Restrict each module’s view of the database type so cross-module access is a compile error.
  • Layer your tenancy defences: RLS + access envelope + RBAC/ReBAC, all three. Each catches a different class of bug. They are not alternatives.
  • Set RLS context once at the request boundary. Re-establish it at every async boundary from the principal stored in the job payload.
  • Decouple the tenant concept from the user concept early, even if tenant_id == user_id for now.
  • Treat the job payload as authenticated because the write path was authenticated. The worker re-establishes RLS and envelope from the payload but doesn’t re-authenticate the principal; the trust comes from controlling who can write to the job table, not from the worker’s own checks.
  • Use JSONB for shapes the LLM produces; validate at the application boundary.
  • Add the outbox pattern only at the edges that are async.
  • Add a workflow orchestrator only when you have multi-step pipelines that need durable state, compensation, or pausing, and adopt it only for those flows, not the whole app.
  • For agentic flows, treat the workflow orchestrator as close to mandatory. Per-workflow cost limits, per-step observability, and principal-leakage tests are not optional.
  • Track LLM cost per-call, per-tenant, per-workflow from day one. The day you need this data and don’t have it is expensive.
  • Use SSE before WebSockets. Streaming endpoints sit outside transactional flows. Stream observable progress; persist canonical state at commit.
  • Don’t introduce ports/adapters speculatively. Do introduce them when (a) your testing strategy depends on injecting fakes for slow/expensive I/O (LLMs, payment providers, email), (b) you want offline-capable local development against real-software substitutes (Mailpit, MinIO, ElasticMQ, etc.), or (c) you have or expect multiple production adapters. Any one of these justifies the abstraction for the relevant port.
  • Prefer integration tests against real infrastructure (testcontainers, ephemeral DBs in CI) over hand-built fakes for ports with leaky semantics (SQL, full-text search). Reserve in-memory fakes for ports with naturally clean interfaces.
  • Default to command-then-poll for async UX (POST returns a job ID; client polls a status endpoint). Reach for SSE only when polling isn’t enough.
  • Add OCC tokens (version column or updated_at) to every mutable record from day one. Treat single-user-multi-device as a real concurrency case, not an edge case.
  • Protect manual corrections from being overwritten by automated re-runs (flag the field, layer the storage, or surface the conflict explicitly).
  • Don’t introduce event sourcing until you have a temporal-query requirement that an audit log can’t serve.
  • Prefer deterministic LLM pipelines over agentic ones when both work. Agentic patterns earn their keep on dynamic tasks; over-applying them buys cost and latency.

The architecture you want is the one that lets you delete code when you’re wrong. Most “advanced” architectures are bets that you’re right; the simple ones are bets that you’re going to learn.

// Context & author

You are reading Field Notes by Auxil

Auxil is an independent software systems consultancy and active product factory operated by veteran software practitioner Tim Farland alongside a vetted peer network of senior specialists. Based on Waiheke Island, Auckland, we design, build, and audit high-stakes SaaS systems and production-grade AI pipelines globally.

Explore →

Tim Farland

Operator / Architect / Engineer
// Contact

Let's discuss your project

If you are looking for a reliable, competent, efficient Principal Architect or Engineer for scoped, delivery-focused contracting or advisory, reach out.

Waiheke Island, Auckland · Available for remote or CBD hybrid engagements