Estimated Developer Minutes: A Simple Way to Measure What AI Actually Produces

Everyone wants to measure AI coding productivity, but most metrics miss the point. Lines of code? Meaningless. Number of commits? Gameable. Pull requests merged? Says nothing about complexity.

We needed something better. That's why Actual AI uses Estimated Developer Minutes (EDM) — a concept inspired by Yegor Denisov-Blanch's research at Stanford.

What Is EDM?

EDM answers one question: how long would the average developer take to write this code?

That's it. You take a piece of generated code — a function, a commit, a pull request — and ask an LLM to estimate how many minutes a competent human developer would need to produce the same result from scratch.

A one-line config change? Maybe 2 minutes. A well-structured API endpoint with validation, error handling, and tests? Maybe 45 minutes. A complex state machine with edge case coverage? Could be several hours.

The beauty of EDM is that it captures what actually matters: the value of the work produced, not the volume.

Why EDM Works

Denisov-Blanch's research at Stanford showed that LLMs can predict expert evaluations of code with surprising accuracy. The same capability that lets an LLM review code quality lets it estimate the human effort that code represents.

This aligns with how experienced engineering managers already think. When a senior engineer reviews a pull request, they intuitively know whether it represents 10 minutes of work or 3 hours. EDM simply makes that intuition explicit and scalable.

How Actual AI Uses EDM

Our Management Agent uses EDM to track what AI agents actually deliver. When an agent completes a task, we don't just check if it passed CI — we estimate the developer-equivalent effort it produced.

This gives engineering leaders a metric they can reason about: "Our AI agents produced 840 estimated developer minutes of work this week" is far more meaningful than "our agents made 47 commits."

EDM also helps with prioritization. If an AI agent spends 20 minutes of compute on a task that represents 5 EDM, that's a different ROI than spending 20 minutes on a task worth 120 EDM.

A Practical, Imperfect Measure

EDM isn't perfect. Estimates vary by the LLM doing the estimating. The "average developer" is a simplification. Some code is hard to write but easy to estimate; other code is the opposite.

But perfect isn't the point. EDM gives teams a consistent, scalable baseline for reasoning about AI productivity — something that lines of code and commit counts never could. And because it's grounded in human-equivalent effort, it stays meaningful even as AI capabilities change.

If you're measuring AI coding output today, start with the simplest question: how long would a person take to do this? You might be surprised how far that gets you.

Estimated Developer Minutes: A Simple Way to Measure What AI Actually Produces

What Is EDM?

Why EDM Works

How Actual AI Uses EDM

A Practical, Imperfect Measure

More Articles

Your Software Factory Should Build Itself

The 8 Levels of AI Coding Experience