At the STARWEST Leadership Summit last year, I opened my talk by asking the room to raise their hands if they'd been feeling more existential dread about their careers in the last few months than they had in the last decade. Most hands went up.
I raised mine too.
I've been doing this for more than twenty years. I've been through the shift from waterfall to agile, from on-premise to cloud, from siloed teams to DevOps. I was part of the team at Bing when Live Search went down and Bing went up - early web search at a scale where if something broke, a lot of people knew about it immediately. I built continuous delivery pipelines at a gaming startup from scratch, the kind where you're figuring out as you go what "deploy safely thirty times a day" actually requires in practice. None of those transitions felt as fast or as disorienting as what's happened in the last eighteen months.
And yet.
When I look back at the talks I've been giving since 2016 - on CI/CD, test design, observability, coding for compatibility, building resilient teams - almost nothing I said is wrong. The principles are intact. The frameworks still apply. The practices I was advocating for then are the same practices that make agentic development work now.
When I started writing this series, I was VP of Engineering at Rokt. As this post goes up, I'm making a different announcement: I've left that role to co-found Vega - and building it has been the most direct test of everything in this series I've ever run.
Building AI-first from scratch is different from bringing AI into an existing system, and the difference is harder than I expected. With a legacy codebase, you have guardrails whether you built them intentionally or not - tests that exist, conventions that are enforced, architecture that pushes back. With a greenfield AI-first build, there's nothing. The AI is working from the average of the internet, which is a decent starting point for generic code and a liability for a real product with real customers and real data. If you don't build the guardrails yourself, from the start, you are building on sand.
So that's what we did first. We came off MVP prototype and stopped. How does it deploy? Is there a safe environment to test changes before they hit production? How does it test? How do we lock in requirements so the AI is implementing what we actually specified? How do we make it structurally impossible for an agent to touch a production database without explicit human authorization - and how do we also make it possible to get AI help in production in a break-glass situation without dismantling all of that?
There's a real tension here that nobody talks about enough. At a startup, it's very easy to spend all your time on the fun part - building tooling, refining guardrails, optimizing the agentic pipeline - instead of shipping the thing customers need. The answer isn't to wait until the foundation is perfect. The answer is to start somewhere, anywhere, and iterate. A little context, then more context. A few guardrails, then more guardrails. Get a working loop and tighten it.
Here's where that iteration has gotten us. We have a Slack channel where anyone - technical or not - can report a fit-and-finish issue. Wrong text, overlapping UI element, minor copy error. When something gets reported, the system evaluates whether it's genuinely fit-and-finish scope, confirms the change isn't complex enough to require human design, makes the change, opens a PR, runs checks, and then runs a second pass with separate agents specifically tasked with verifying the change did only what it was supposed to do. There are escape valves at every inspection point. If anything looks wrong, it stops. When everything looks right, a human does a cursory review and merges.
No hops. A customer notices something. A fix is ready. One human touchpoint at the end, not a chain of handoffs from customer to support to PM to engineer. Two engineers, doing work that previously required a team of ten. Features that would have taken a staff-level engineer months shipping in a week.
That's not because we found the best AI model. It's because we built the specs, the architecture, the tests, the CI, and the monitoring first - and then AI had something real to work with.
The stakes are higher. The velocity is faster. The margin for error is smaller. But the fundamentals? Identical.
What This Series Argued
Six posts ago, I made a claim: the teams moving fastest with agentic development are not the ones who found the best AI model. They're the ones who already had strong engineering fundamentals in place.
Here's how that played out across the series.
Planning and speccing (post 2) is the highest-ROI investment in an agentic workflow because context is the variable the agent can't supply itself. A well-written spec with clear acceptance criteria, an example to follow, and explicit constraints is the difference between 80% of the right thing and 80% of whatever the AI guessed you wanted. The teams doing spec-based agentic development - writing real specs before the agent writes real code - are the ones getting consistent, usable output.
Architecture (post 3) is where implicit knowledge becomes a liability. Every assumption that's always been understood but never written down is an assumption an agent will eventually violate. The architectural practices that make this safe - API contracts, backward compatibility, blast radius controls, feature flags - aren't new ideas. They're the same ideas that made continuous delivery safe for human developers. The difference is that now they're not optional best practices. They're the floor.
Testing (post 4) is where flakiness goes from annoying to catastrophic. A flaky test that a human developer ignores becomes a green test suite that an agent has optimized against, validating completely broken behavior. The principles of fast, reliable, and specific tests, and the discipline of quarantining flakiness immediately, matter more in this environment, not less.
CI (post 5) has become the agent's primary manager - the only feedback mechanism available for code that was written without your team's tacit knowledge. CI that gives vague, scattered, slow feedback isn't just inconvenient. It's a bottleneck in a workflow where the agent is supposed to self-correct and the human is supposed to review the result. The investment in clear, consolidated, deterministic CI feedback pays off immediately and compoundingly.
Monitoring (post 6) is the backstop for everything the tests don't cover. An alert is a continuously running test case in production. The discipline of knowing what you're not testing, and having monitoring that catches it when it breaks, is what separates "we ship fast" from "we ship fast and find out immediately when something goes wrong."
The Uncomfortable Truth
If you have been investing in these things - solid specs, strong architecture, reliable tests, tight CI, good monitoring - you are in a very good position. You have the foundation that makes agentic development work. Adding AI to your workflow is an acceleration of something that's already moving in the right direction.
If you haven't been investing in these things, the agents didn't fix that. They made the gaps more visible and more expensive.
This is the part nobody wants to say out loud, because it sounds like "you should have done your homework." That's not quite what I mean. Most teams made reasonable tradeoffs given their constraints. Technical debt accumulates because shipping features was more urgent than paying it down. Tests were skipped because the deadline was real. Architecture evolved organically because you were too busy building to stop and design. These are understandable decisions.
What's changed is that those decisions now have higher carrying costs. The gaps that were manageable at human development velocity become significant at agent velocity. The implicit knowledge that kept things from breaking when experienced engineers were the only ones touching the code is no longer sufficient when agents are also touching the code.
The question isn't whether you should have done this earlier. The question is what you're going to do about it now.
Starting Points
You don't fix the foundation all at once. Here's where to start.
Find your most expensive onboarding problem. Where do new engineers take the longest to become productive? Where are the bodies buried? That's where you build the context map first - the document that turns a five-day task into a forty-minute one. (See post 2 for what that looks like in practice.)
Kill your flaky tests. Every single one. Not "deprioritize." Kill them. Quarantine them until they're fixed. A test suite with known flakiness cannot be trusted by an agent, and a test suite an agent can't trust is worse than no test suite at all. (Post 4.)
Audit your CI error messages. Pick the last five CI failures your team has had and ask: could an agent read this error message and know exactly what to fix? If the answer is no, you have specific work to do. (Post 5.)
Identify what you're not monitoring. Pick the three most important things your customers pay you for. Do you have alerts for each of them? If one broke silently, how long until you'd know? (Post 6.)
None of these require a major project. They're incremental, they're targeted, and they compound quickly. The foundation doesn't get built all at once. It gets built one decision at a time, by teams who understand why it matters.
Why This Moment Is Actually a Good One
Here's the part of the conversation about AI and software engineering that I think gets lost in the existential dread: this is a moment that rewards people who understand systems deeply.
The engineers who will thrive in the next few years are not necessarily the ones who can generate the most code per hour. They're the ones who understand what good looks like - in a test suite, in an architecture, in a CI pipeline, in a monitoring strategy. They're the ones who can define the spec that makes the AI useful instead of the spec that produces 80% of the wrong thing. They're the ones who recognize when an agent has optimized for a metric instead of solving the actual problem.
Those skills are hard-earned. They come from years of writing tests, debugging production incidents, designing systems that survived high load, and making architectural decisions you had to live with for a decade. They are not skills that AI is going to make obsolete. They're the skills that make AI worth having.
The fundamentals aren't going anywhere. They're just more urgent - and, finally, obviously valuable to people who couldn't see why before.
This is post 7 of 7 in The Boring Parts Matter: Engineering Fundamentals for the Agentic Era.

