Queen of Code Logo
Melissa BenuaQueen of Code
Back to all posts

Designing for Agents: Architecture That Survives High-Velocity Change

AI-generated code will surface every assumption you never wrote down. If your architecture requires shared implicit knowledge to not break things, agents will break things.

March 20, 2026
9 min read
Architecture
Agentic Development
Engineering Fundamentals
CI/CD
Designing for Agents: Architecture That Survives High-Velocity Change
Melissa Benua

Melissa Benua

Engineering Leader & Speaker

My first job out of college was at Boeing. I worked on airplane software, and the rigor was unlike anything I experienced before or since - multiple levels of validation for every change, review chains that felt endless, documentation requirements that seemed almost paranoid.

I was frustrated by it for a long time. Then a very senior tester sat me down and said: "This is an airplane. If we make a mistake, the airplane crashes. People could die."

That conversation gave me a framework I've used ever since: relative risk. The cost of an architectural failure scales with what's at stake. For a photo sharing site, a bad deploy might cause a 500 page outage for a few users but nothing life-or-death. For an airplane, it means something far worse. The rigor of your architecture should match the cost of failure.

Here's what's changed in agentic development: the cost of getting architecture wrong has gone up significantly, because the rate at which an agent can introduce a bad assumption into your codebase is much higher than a human developer ever was. An architect's instinct used to be enough to catch problems before they landed. Now the architecture itself has to enforce the constraints - because the thing submitting changes doesn't have instincts, only context.

I learned that lesson again years later at mParticle, at a very different scale and with much higher stakes of a different kind.

mParticle's platform ingests and routes customer data - event streams, user attributes, identity - for some of the largest apps in the world. Customers rely on us for complete data fidelity. Not approximately complete. Every event, every attribute, every field, for 100% of their data, period. The schema adherence across our microservices was tight by necessity: with data at that scale, across that many services, a schema change that wasn't backward-compatible didn't just break service-to-service communication. It could cause irrecoverable data loss. Data that is gone is gone. There's no rollback for that.

We caught one of those in a testing environment. It never made it to customers in any significant way, and we were fortunate. But the near-miss was clarifying. From that point forward, we treated schema compatibility as a hard constraint enforced in code - not a checklist item, not a careful-reviewer call, but unit tests that would actively prevent you from merging a change that broke backward or forward compatibility. The test suite made it structurally impossible to make that class of mistake without knowing you were doing it.

That suite has caught the same failure pattern multiple times since. Each one became an incident that wasn't.

I'll take a million incidents that weren't over a single incident that was.

The Boeing lesson and the mParticle lesson are the same lesson at different altitudes: the rigor of your architecture should match what it costs when it breaks. What's changed in an agentic environment is that the cost hasn't changed, but the rate at which a bad assumption can be introduced has gone up dramatically. The architecture has to do more of the work that experience used to do.

What Changes When Code Velocity Increases

The core insight is this: in a low-velocity development environment, a lot of architectural complexity can live in people's heads. Senior engineers carry mental models of the system - where the sharp edges are, which APIs have undocumented quirks, which database tables are soft-deleted vs. hard-deleted, why you don't touch the transaction lock timeout. That knowledge keeps things from breaking even when the written architecture doesn't capture it.

Increase the velocity significantly - whether through a larger team, a tighter release cadence, or AI agents generating changes faster than a human could review - and the tacit knowledge stops being sufficient. Things break in ways that make experienced engineers say "but we always knew not to do that." The problem is that "always knew" was never written down, and now something is generating changes that doesn't always know.

The architectural decisions that matter most in a high-velocity environment are the ones that eliminate the need for tacit knowledge. Explicit contracts. Versioned APIs. Schema migrations that are backward-compatible by construction. Blast radius limits that prevent a bad change in one service from cascading into others. Feature flags that let you ship code before you activate it.

These aren't exotic practices. They're the same things that made continuous delivery safe for human developers. They just matter more now.

API Contracts: Writing Down What You've Always Known

An API is a contract between services. When it's honored, things work. When it breaks - silently, because nobody updated the contract - you find out in production.

The practices that protect contracts are mostly about explicitness. Version your APIs. Don't change behavior under an existing version - add a new one. Treat any breaking change as a deployment event, not just a code change. And document the contract: not just the endpoints, but the semantics. "This endpoint returns 404 if the user doesn't exist, not 200 with an empty body." An AI agent that doesn't know the semantic contract will implement what seems logical from the HTTP spec, not what your system actually does.

Contract-based testing is the automated enforcement arm of this. If you have tests that validate the API contract between your services - not the implementation, the contract - then a change that breaks the contract shows up in CI before it hits production. This is one of the most valuable tests you can have, and one of the most commonly skipped.

Backward and Forward Compatibility

In a continuous delivery system, you can't guarantee that every service deploys at the same time. The caller might be on the old version while the callee has already updated, or vice versa. In an agentic system, this is even more acute - agents ship changes fast, and your deployment pipeline has to be able to handle services being at different versions simultaneously.

This means: additions are safe, removals are not. You can add a new field to an API response safely - callers that don't know about it will ignore it. You cannot remove a field that callers depend on, even if you think nobody is using it anymore. The same principle applies to database schemas: add columns, don't remove them until you've verified nothing reads them.

Forward compatibility - your old code handling data written by a newer version - is harder. The easiest path is to design your data formats to be extensible from the start. Use nullable fields rather than required ones for new additions. Treat unknown fields as ignorable rather than errors. Build in version markers so you can distinguish old data from new.

These constraints are easy to forget when you're moving fast. They're the first thing an AI agent will violate if you don't make them explicit - because the internet doesn't write code this carefully either.

Architecture blast radius: containing the impact of changes

Blast Radius: Limiting What Can Go Wrong

Every architectural decision has a blast radius - the scope of what breaks when it breaks. A bug in a shared library can take down every service that imports it. A schema migration gone wrong can corrupt data across multiple consumers. A deployment failure in a foundational service can cascade across everything that depends on it.

Limiting blast radius is about designing your system so that a failure in one place can't become a failure everywhere. The tactics are familiar: service isolation, circuit breakers, feature flags, gradual rollouts, and - critically - staging environments where you can observe the blast before it reaches production.

In an agentic environment, blast radius matters more because changes come faster. There's less time between "code was written" and "code is in production" for a human to notice something is wrong. If your architecture has no blast radius controls, a bad agent-generated change can hit production before anyone realizes it's bad.

Feature flags are particularly valuable here. They let you ship code that's off by default, activate it for a subset of users, and roll it back instantly if something goes wrong - without a new deployment. This decouples code deployment from feature activation, which means you can ship more often without betting everything on each ship.

Supply Chain: An Old Problem at New Velocity

One question that comes up a lot in discussions about agentic development is supply chain security - what happens when agents upgrade packages and introduce vulnerabilities. The short answer is: this isn't a new problem, it's the old problem moving faster.

Developers have always done willy-nilly package upgrades. They've always introduced dependencies without checking the full supply chain. SolarWinds wasn't caused by AI. Package injection attacks predate large language models by decades. What's new is the velocity - agents can introduce and upgrade dependencies faster than a human would, which means your supply chain validation has to be automated and in the critical path, not a periodic audit.

This means: pin your dependencies, validate them on every change, and have automated tooling that flags new packages before they land in main. Not because AI makes this problem worse in kind, but because velocity makes it worse in degree.

The Architectural Checklist

Before an agentic workflow touches a codebase, it's worth asking a few questions:

Are your API contracts written down and tested? Or do they live in institutional memory?

Are your schema migrations backward-compatible by default? Or do you sometimes need to coordinate deploys?

Do you have blast radius controls - feature flags, service isolation, rollback mechanisms - or does a bad change affect everything?

Is your supply chain validated automatically on every PR, or periodically and by hand?

If the answer to any of these is "we rely on experienced engineers to know where the sharp edges are" - that knowledge needs to be made explicit before you add velocity. Agents don't absorb tribal knowledge. You have to write it down.


This is post 3 of 7 in The Boring Parts Matter: Engineering Fundamentals for the Agentic Era.

Related Posts

An Unreliable Test Is Worse Than No Test at All

Flaky tests are annoying when a human hits them. They're catastrophic when an AI agent hits them - because the agent doesn't sigh and re-run the build. It tries to fix the code.

Context Is the Spec: Planning and Defining 'Good' for AI-Assisted Development

The quality of what comes out of an agentic workflow is directly proportional to the quality of what goes in. Here's what that looks like in practice -- and why the spec is your highest-ROI engineering investment.