Context

The five levels of AI coding

Dan Shapiro wrote the definitive framing of the five levels of AI coding — modeled on the NHTSA’s five levels of driving automation:

Level Analogy What you are
0 Your parents’ Volvo Manual coder
1 Lane-keeping assist Occasional AI user
2 Highway Autopilot AI pair programmer
3 Waymo with safety driver Manager drowning in diffs
4 Robotaxi PM who writes specs and waits for tests
5 Dark Factory A black box that turns specs into software

“It’s dark, because it’s a place where humans are neither needed nor welcome.” — Dan Shapiro

I came across this post living between Level 3 and Level 4. At work — Level 3: the safety driver in the Waymo, reviewing diffs, the human in the loop on every decision. On POCs and hobby projects — Level 4: writing specs, leaving the agent to run, checking if tests pass. NLSpec itself is Level 4 work. Dan’s framing was the wake-up call: even Level 4 feels like you’re done. You’re not.

Nate Jones Video

Then I watched a talk by Nate Jones — go watch it. It added a critical dimension to the picture.

The core argument:

  • Intelligence is being commoditized
  • Tokens are a unit of intelligence
  • Organizations are no longer driven by headcount — they’re driven by a developer’s ability to convert intelligence spend into business value
  • The key metric: how many tokens can the business provision to every developer

Three emerging developer career tracks:

  • AI Architect — foundational infrastructure, reliable systems on top of probabilistic AI, overall AI architecture
  • System Operator (Optimizer) — manages velocity, AI-assisted coding process, the dark factory workflow
  • Domain Translator — technical fluency + deep domain expertise, bridge between technology and the real world

My two cents

This three-track model is transitional, not permanent. By H2 2027 into 2028, as AI development matures and becomes more framework-driven, I expect the tracks to begin re-converging into a familiar pattern:

Today’s AI Track Tomorrow’s Role
AI Architect Systems Engineer
System Operator Software Engineer
Domain Translator Business Systems Analyst

The one-person AI developer will exist — but as a niche, not the norm. And by late 2028 into early 2029? If the tooling matures the way I think it will, we may be riding a full AI-driven economic boom. Fingers crossed.

I’ve seen this movie before — three times:

  • Console gaming (1980s): Atari bedroom coders → Nintendo’s licensing model and cartridge costs locked out individuals → by the SNES/Genesis era it was almost entirely studios with capital
  • App Store (2008): Explosion of indie developers → by 2012-2014, corporate teams with capital dominated the top charts — a solo developer’s $5 app created in weeks no longer stood a chance
  • PC/Mobile indie (2010s): Unity lowered the barrier → indie games had a golden window → studios with QA and marketing took over the top grossing charts

The platform matures. The economics consolidate. The lone wolf becomes the exception. AI-driven development is on the same trajectory — and the tools I build today need to scale beyond the solo developer.


NLSpec

StrongDM’s Software Factory pattern is built for today’s world — the 3-developer AI team. Their principles are the right foundation:

  • Seed → the initial spec, PRD, or codebase
  • Validation → end-to-end harness, as close to real environment as possible
  • Feedback loop → output fed back into inputs, self-correcting, compounding correctness
  • The loop runs until holdout scenarios pass — and stay passing

This is the right philosophy. But in 2-3 years, organizations will be running dozens of agents across hundreds of specifications. A markdown file handed to an agent won’t scale.

Spec-Driven Development (SDD) is the correct pattern. NLSpec is the formalization of it at organizational scale.

  • StrongDM coined the term “NLSpec” — a human-readable spec directly usable by coding agents
  • I am adopting their terminology and formalizing it into an evolving specification for organizational-scale AI development
  • NLSpec stands on StrongDM’s shoulders, not in competition with them

NLSpec Is Defined by NLSpec

The NLSpec specification — the document that defines what NLSpec is, how it works, what the element types are — is itself written as an NLSpec.

The spec for the spec is a spec. This isn’t a cute trick. It’s the bootstrap proof.

Why this matters philosophically:

  • If NLSpec can describe itself completely and unambiguously, it proves the format is expressive enough to describe real systems
  • It means you can load the NLSpec spec into a fresh MCP server and let agents reason about the system they’re working inside

Why this matters practically:

  • The 15-section structure was designed by using it to describe a real, complex system (itself)
  • Every element type — RECORD, FUNCTION, SCENARIO, ENDPOINT, PIPELINE — appears in the NLSpec spec
  • The cross-reference system (USES, USED BY, SEC tags) was exercised on a spec with dozens of interdependencies

Check the repository — the framework spec is itself an NLSpec document. When the MCP server is built and the scenarios run, it should validate cleanly against the schema it defines. That’s the next post.


The Bootstrap Phases of NLSpec

NLSpec meets you where you are. Nobody starts with an organizational spec — everyone starts with one file. The phases describe how the tooling grows with you:

Phase What You Have What You Get When to Move On
0 1 spec, 1 agent Template + CLAUDE.md, no tooling Spec exceeds 3,000 lines
1 1 large spec Bootstrap MCP server, 8 tools Need to split by concern
2 N specs Namespaces, cross-spec queries, decomposition Multiple projects/teams
3 N projects, N teams Org-scale management, A2A orchestration

Phase 0 is where everyone starts — and where NLSpec is right now. Copy the template, fill it in, hand it to an agent. No server, no MCP, no namespaces. Just a markdown file. This already works with any agent today.

Each phase is self-contained. You don’t need Phase 2 to get value from Phase 1. You don’t need Phase 1 to get value from Phase 0.

The 6 Agent Operating Modes

Within any phase, the agent operates in one of six modes. The mode is determined by how you instruct the agent — not by the state of the codebase.

Mode Trigger words What the agent does Modifies spec? Modifies code?
SPEC “spec”, “refine”, “write spec”, “add scenarios” Refines and improves the spec Yes (with approval) No
DESCRIBE “explain”, “walk me through”, “what does this do” Explains the spec in plain language No No
IMPLEMENT “implement”, “build”, “create” Generates codebase from spec No Yes
FIX “fix”, “bug”, “broken”, “failing” Targeted bug fix No Yes
VALIDATE “validate”, “test”, “verify” Runs tests, reports results No No
CONSOLIDATE “consolidate”, “absorb patches”, “clean up” Absorbs patches, refactors No Yes

How the modes map to StrongDM’s loop:

  • Seed → SPEC mode. You’re defining the contract. The agent checks for missing elements, proposes scenarios, validates cross-references. No code written.
  • Validation → IMPLEMENT mode. The agent reads the spec section by section, builds the full codebase, writes tests for every SCENARIO, and does not stop until all pass. The spec is the harness.
  • Feedback loop → VALIDATE + FIX. The agent runs the suite, reports what’s broken, makes targeted fixes. Does not refactor, does not add features.
  • Cleanup → CONSOLIDATE. Accumulated patches are absorbed into clean structure. All scenarios must still pass.

The key insight: each mode has a hard boundary. SPEC mode never touches code. IMPLEMENT mode never touches the spec. FIX mode never refactors unrelated code. These boundaries are what keep agents from going rogue.


The Template: Common Nouns and Verbs

Every NLSpec document follows a 15-section structure with a controlled vocabulary of typed elements — parseable by both humans and agents.

Element Types:

Element What it is Required fields
RECORD Data structure USED BY references
FUNCTION Unit of behavior USES + THROWS
SCENARIO Test case [SEC:x.y] tag + tier ([SMOKE], [AFFECTED], [FULL])
ENDPOINT API surface entry point
PIPELINE CI/CD definition (inside the spec)
CONFIG Runtime configuration value
ENUM Constrained value set

Cross-reference system:

  • Every element carries typed relationships
  • A FUNCTION that USES a RECORD creates a graph edge
  • When an agent modifies a RECORD, it walks the graph to find every dependent FUNCTION — before making any change
  • Agents assemble a minimal context slice by walking edges, not reading the whole document — this is how large specs stay manageable

The 15-section order is deliberate:

Abstract → Architecture → Data Model → Core Functions → API Surface → Error Model → Scenarios → Configuration → File Structure → Build and Run

An agent implementing from scratch works section by section with no forward dependencies.


How NLSpec Differs from Existing Tools

Here’s how NLSpec compares to the current landscape:

  StrongDM NLSpec NLSpec (this) GitHub Spec Kit OpenSpec / BMAD / Kiro
What it is 3 markdown files for one product A specification CLI + workflow scaffold Workflow tools for SDD
Spec format Implicit Formalized 15-section template with typed elements Freeform markdown Freeform markdown
Tooling model None — hand files to agent MCP server (programmatic access) Slash commands Slash commands
Element types Used but not defined RECORD, FUNCTION, SCENARIO, ENDPOINT, PIPELINE — first-class, queryable None None
Cross-references None USES, USED BY, SEC tags — agents walk edges for context slicing None None
Multi-spec Execution graph (pipeline nodes) Spec composition graph (N specs with typed imports) One spec per project One spec per change
CI/CD Not modeled PIPELINE definitions, FILE_TO_SECTION_MAP Not modeled Not modeled
Agent access File read File read (Phase 0) → MCP tools: get, search, slice, create, patch (Phase 1+) Slash commands Slash commands
Self-describing No Yes No No
Namespaces No Yes — one server manages N specs No No

The core differentiators:

  • StrongDM: “Here are three spec files — hand them to an agent.” (A document.)
  • GitHub Spec Kit: “Here’s a workflow to write specs and feed them to agents.” (A process.)
  • NLSpec: (A formal, evolving specification.)
    • Grows with you — Phase 0 starts with a markdown file, just like StrongDM. Phase 3 manages an entire organization. Same format at every phase, tooling added as you need it.
    • Formal vocabulary in the spec — typed elements (RECORD, FUNCTION, SCENARIO, ENDPOINT, PIPELINE) with controlled verbs (USES, USED BY, THROWS, SEC tags). Not freeform markdown — structured, queryable, composable.
    • Formal vocabulary in the agent prompt — six named modes (SPEC, DESCRIBE, IMPLEMENT, FIX, VALIDATE, CONSOLIDATE), each with defined triggers, boundaries, and behaviors. The agent knows exactly what it is allowed to do and what it is not.

StrongDM coined NLSpec. This work evolves and formalizes it as a specification — giving it structure, tooling, and a growth path to organizational scale.


Conclusion: Skipping Ahead

Where I was: Between Level 3 and Level 4 — Level 3 at work, Level 4 on POCs and hobby projects. NLSpec itself is Level 4 work.

Where I’m going: Level 5. The Dark Factory.

What that means in practice: Every hobby project from now on is an NLSpec.

Dan Shapiro says he’s at Level 4 — writing specs, leaving for 12 hours, checking if tests pass. NLSpec is what bridges Level 4 to Level 5:

  • Level 4: hand a markdown file to an agent and hope it reads it right
  • Level 5: give the agent a queryable, graph-traversable, MCP-accessible specification it can reason about programmatically

That’s not an incremental improvement. It’s a different substrate.

When intelligence is commoditized and tokens are a unit of intelligence, the leverage isn’t in the code. It’s in the spec.

Where things stand today:

The NLSpec specification is published on GitHub. The spec has been written — but not yet formally validated by implementing against it end to end. That is the next step: running the scenarios and proving the loop closes.

That said, I’m not starting from zero. NLSpec formalizes the same vocabulary and patterns StrongDM already proved out on Attractor — a real production system built entirely using their NLSpec format. The foundation is solid. What remains is validating the formalization itself.

If you want to get ahead of it — read the spec, try the template on your own project, and run it against an agent. I’d love to know what breaks. The spec is only as good as what survives contact with a real implementation, and the more people stress-testing it the better.

The MCP server is coming. More posts to follow.


Have thoughts or found something that breaks? Find me on GitHub or drop a comment below.