What AI-Led Engineering Actually Looks Like in Practice

Let me be specific about what we're doing, because specificity is what most of these conversations are missing. We use Claude Cowork, Codex, and V0. Not as experiments running in a sandbox somewhere. Not as a pilot program managed by two engineers with extra time. As the primary way we ship. the actual execution model for engineering and product work across the organization. The question we're asking is not "does this work?" We're past that. The question is how fast we can expand the footprint.

That framing matters. When AI-assisted engineering is a pilot, it produces pilot-quality conclusions: controlled conditions, selected use cases, enthusiastic participants. When it's the actual model, you learn different things. You learn what breaks at scale, what requires more human judgment than you expected, and which parts of the original job description were load-bearing in ways no one realized until they changed.

What Actually Changes

The speed of a first draft changes dramatically. A feature that used to take two engineers three days to get to a reviewable state now reaches that point in a fraction of the time. This is not a marginal improvement. it's a different order of magnitude, and it changes how you think about exploration. When a first draft is cheap, you try more things. You validate assumptions faster. You kill bad ideas earlier, before they've consumed two sprints and developed internal advocates.

Who can produce working code changes. This is the one that surprises people most. Product managers and operations staff who understand what needs to be built but have never written production code can now produce functional prototypes. Not production-ready systems, but working things that communicate intent with precision, that can be handed to an engineer for hardening rather than described in a requirements document and hoped about. The translation layer between "what we need" and "what the code does" has gotten dramatically thinner.

Code review has a new layer. Reviewing AI output requires different instincts than reviewing human output. When a human engineer writes something questionable, there's usually a trail. a commit message, a comment, a pull request description. that explains what they were trying to do. AI output is confident without being transparent about its reasoning. It can be wrong in ways that look exactly like right. The reviewer needs to be more skeptical, not less, than they were before. The skill of knowing what to look for has become more important, not less.

The cost of exploration goes to near zero. This is the compounding effect that people underestimate. Cheap exploration doesn't just mean more experiments. it means a different culture around risk. You stop protecting ideas because ideas are no longer expensive to test.

What Doesn't Change

Architectural judgment doesn't change. The model will generate code. It will not decide what the right system design is for a healthcare data platform with specific latency requirements, a legacy data pipeline, and a compliance boundary that it only vaguely understands. That judgment still requires a senior engineer who knows the context and has done this before.

Domain knowledge doesn't change, and this is critical in healthcare. Our domain is risk adjustment, clinical quality measurement, and value-based care. That means HEDIS measures, CMS regulations, STARS methodology, the specific logic of how a gap in care gets identified and closed and documented. These are not things the model knows with precision. It has broad awareness. It does not have the specificity required to make production decisions in a regulated clinical domain.

The model doesn't know that HEDIS measure specifications change annually, that a specific exclusion criterion is applied differently across commercial and Medicare Advantage populations, or that a particular payer interprets a coding requirement in a way that's unique to their contract. Your engineers do. That knowledge is the job.

Accountability for what ships doesn't change. This one sounds obvious but needs to be said explicitly, because there is a temptation. especially in organizations under pressure to move faster. to treat AI output as somehow pre-validated. It is not. Every line that goes to production is the team's responsibility, regardless of which part of the stack generated it.

The ability to recognize when the model has confidently produced something wrong doesn't change either. If anything, it becomes more important. AI systems are good at producing output that sounds authoritative. The skill of healthy skepticism. of looking at a clean, well-structured result and asking "but is this actually correct for our specific situation?". is not a skill you can outsource. It has to be trained in, and it has to be valued by the team's culture.

The Engineer's Job Description Gets Rewritten

The human engineer is not eliminated. Their job description is rewritten, and the rewrite moves them up the stack. They are no longer primarily responsible for the mechanics of producing code, for translating a well-understood requirement into correct syntax. They are responsible for architectural decisions, for domain-specific validation, for the judgment calls that determine whether the AI's output is correct in this specific context.

Put differently: the engineer is now responsible for things they can no longer blame on the difficulty of writing the code. Before, "this took longer than expected" could mean the implementation was complex. Now, complexity of implementation is less available as an explanation. What remains is the complexity of the problem itself. which was always the hard part, and is now more visibly the hard part.

In healthcare tech specifically, the stakes of this shift are higher than in most domains. A hallucinated output in a consumer app is a bad user experience. A hallucinated output in a risk adjustment or clinical quality platform can produce a wrong clinical recommendation, a billing error, or a compliance failure. The productivity gain from AI-assisted engineering is real. The oversight requirement is non-negotiable. They are not in tension, but they require being held simultaneously, with discipline, by people who understand both what the tools can do and what the domain requires.

This is what AI-led engineering actually looks like when it's the real model and not a demo. More speed, more access, more exploration, and more responsibility sitting exactly where it always sat: with the people who know what the system is supposed to do and what happens if it doesn't.

Five Things We've Learned Running AI as the Execution Model

01 The first draft gets cheap; the review gets more important. The time savings are real, but they shift to the front of the process. You save on generation; you cannot save on validation. Invest in review discipline or you will pay for it downstream.

02 Domain knowledge becomes the scarce resource. The model is broad and fast. Your engineers know HEDIS, CMS, your specific payer contracts, your data model. That knowledge is what makes the output useful rather than merely plausible. Protect it and build it deliberately.

03 Widen access deliberately. PMs and ops staff with working prototype capability change the product conversation. Give them the tools and a clear boundary about what goes through engineering before it ships.

04 In regulated domains, oversight is not optional. The productivity gain and the compliance requirement are both real. Don't let one crowd out the other. If your AI-led process doesn't have a clear human checkpoint for clinical and billing outputs, fix that before you scale.

05 The engineer's accountability goes up, not down. When code generation is cheap, what remains is judgment. Make sure your team knows this is a promotion in responsibility, not a reduction in relevance. The framing matters for morale and for quality.