DORA Metrics: Engineering Performance Benchmarks

You don't need a DORA dashboard to understand your delivery health

If you have been around engineering teams for a while, you've probably watched this play out: leadership asks how the team is performing, and someone opens a spreadsheet. They count PRs merged this sprint, maybe check Jira for cycle time, pull GitHub stats for deployment count. An hour later, there's a slide deck with five different numbers that don't quite agree.

DORA metrics exist to fix that. The four keys give teams a shared vocabulary for engineering performance. But most DORA explainers stop there: "here's what to measure, go instrument your pipelines." The part that's harder to find is what the numbers actually mean day-to-day, what good looks like, and how to get the signal without running a separate measurement program.

If you're already running work through GoalPath, a lot of that signal is already there. This post maps the four keys to what GoalPath tracks, shows where the gaps are, and explains what Elite performance actually requires.

The four keys, plainly

DORA (DevOps Research and Assessment) identified four metrics that consistently predict both software delivery performance and organizational outcomes. The research has been running since 2014, with annual State of DevOps reports from Google Cloud. The four are:

Lead time for changes. How long does it take from code committed to code running in production? This includes review, testing, merge, and deploy. Elite teams get under one day. Most teams take between one day and one week. The long tail (one to six months) is more common than people expect.

Deployment frequency. How often do you ship to production? Elite teams deploy on demand, multiple times per day. Medium performers deploy somewhere between once per week and once per month. Low performers deploy once per month or less. The gap between elite and low is enormous: 182 times more deployments per year.

Change failure rate. What percentage of changes cause a production incident or require a rollback? Elite teams stay under 15%. The DORA data shows the largest cluster of teams sitting at 8-16%, which sounds manageable until you realize what it means for cumulative incident load over a year.

Mean time to restore. When something does go wrong, how fast do you recover? Elite performers restore in under an hour. More than half of teams in the 2024 report took between one day and one week, which means customers are affected for days, not minutes.

The 2025 DORA report moved away from the four-tier classification (low/medium/high/elite) toward seven team archetypes, which reflects how much variance exists in real teams. But the core measurements haven't changed, and the performance gaps between strong and weak teams are as large as ever.

GoalPath flow metrics cards showing Flow Time, Flow Efficiency, Flow Distribution (59% Features, 6% Bugs, 34% Tasks), and Flow Load

Where DORA runs into product teams

DORA was designed for teams with continuous deployment pipelines. Every metric assumes "deploys to production" as the unit of work. That works well for API services and web apps with mature CI/CD. It works less well for teams shipping iOS apps on a two-week App Store review cycle, or SaaS products where "deploy" means "merge to main and wait for a release train."

Product teams also rarely have the tooling to instrument these metrics accurately. Lead time for changes requires correlating a commit timestamp to a production deploy timestamp. Change failure rate requires tagging incidents to specific deployments. These are engineering-level metrics that require investment to collect cleanly.

That's not a reason to ignore DORA. It's a reason to understand what you can measure with the tools you already have, and where to approximate.

How GoalPath maps to the four keys

GoalPath tracks work at the item level, from when an item is created and started through to when it's finished and delivered. That workflow captures most of what DORA wants to know, in a form product teams can actually use.

Lead time for changes → Flow Time

GoalPath's Flow Time metric shows you two numbers: lead time (from item creation to delivery) and cycle time (from when work actually started to delivery). DORA's "lead time for changes" is closest to cycle time (the active development window). But lead time is what your stakeholders feel.

The Insights page shows both, plus wait time (the gap between creation and when anyone picked it up). A team with cycle time of 2 days and wait time of 12 days isn't a fast team. The work is just waiting to be started.

Elite DORA teams get lead time under one day for a code change. For product feature work, "under one day" is usually not realistic. Features take longer than hotfixes. The useful comparison is your own trend over time. If your median flow time is improving month over month, you're moving toward elite behaviors even if the absolute number is higher.

GoalPath Insights page showing flow time metrics and question categories for diagnosing delivery issues

Deployment frequency → Velocity and throughput

DORA's deployment frequency measures how often you ship to production. GoalPath's velocity measures how many story points your team completes per work week. They're related but not identical.

A team with high deployment frequency and low velocity is shipping frequently but not much. A team with high velocity and low deployment frequency is building fast but releasing slowly, which often means large, risky batches.

The useful signal from GoalPath's velocity chart is throughput: how many items are you finishing each week, and is that trend stable or erratic? Erratic throughput (big weeks followed by dead weeks) often signals the same problems that hurt deployment frequency: large items, review queues, dependencies waiting for other teams.

If your throughput is consistent week over week, you're exhibiting one of the core behaviors of high-frequency deployers: steady, predictable flow instead of batch-and-release.

GoalPath velocity trends chart showing weekly throughput with Trending Up 141% badge and planned vs unplanned breakdown

Change failure rate → Bug ratio in Flow Distribution

This is the trickiest mapping. DORA's change failure rate is specifically about production incidents caused by a deploy. GoalPath doesn't monitor your production environment. It manages your planned work.

What GoalPath does track is Flow Distribution: the split of completed items across Features, Bugs, and Tasks. If your team is completing 40% Bugs in a given period, that's a signal. You're spending nearly half your capacity on things that were already shipped and broke.

High bug ratio in Flow Distribution is a leading indicator of high change failure rate. Teams that ship quality problems generate rework. That rework shows up as the next sprint's bug list before it ever shows up in a DORA dashboard.

Healthy teams in the DORA data tend to have low rework ratios. In Flow Distribution terms, that usually means bugs staying below 15-20% of completed work. It's not a hard benchmark (a team doing active maintenance on a legacy codebase will skew higher), but the trend matters. If bugs as a percentage of completed work are increasing quarter over quarter, your change failure rate is probably climbing too.

GoalPath milestones list showing all milestones with status, business value scores, goal path, and team assignments

Mean time to restore → Item aging in Flow Load

DORA's MTTR measures how fast you recover from a production incident. GoalPath doesn't track incidents, but it does track aging WIP items and how long they've been in progress.

Flow Load shows you items currently in progress and how many are stale or stuck. A stuck item is one that has been in progress for more than 30 days. Stuck or stale items are often held up by external dependencies, unclear requirements, or decisions waiting on stakeholders.

The analogy to MTTR isn't perfect. Stuck feature work isn't the same as a production outage. But the capability that drives fast MTTR is the same capability that clears stuck or stale items fast: clear ownership, fast decision-making, and a culture where problems surface quickly instead of aging silently.

Teams that let items sit stuck for weeks are usually the same teams where incidents last days. The organizational habits that create long MTTR create long item aging. Watching Flow Load gives you early warning before the incident queue tells you about it.

GoalPath board view showing items across Not Started, Started, Finished, Delivered, and Accepted columns by milestone

What Elite actually requires

The DORA data shows that 19% of teams reach elite performance on the four keys. What makes the difference is not tooling. It's a set of organizational practices that elite teams have and lower performers don't.

Small batches. Elite teams ship small changes frequently. They don't accumulate two weeks of work into a large release. Small batches mean smaller blast radius when something goes wrong, faster cycle times, and easier rollbacks.

Deployment confidence. Teams that deploy multiple times per day have invested in automated testing and deployment pipelines. They don't deploy manually. They don't have a "release manager" who coordinates deploys. The pipeline does it.

Fast feedback loops. Elite teams know within minutes if a deploy caused a problem. Staging environments don't cut it. You need production observability.

Decision speed. This is the one GoalPath can help with directly. A common source of long lead times and stuck items is not technical complexity. It's slow decisions. Whose sign-off is needed? Who reviews this change? Who decides if this is good enough to ship? Teams where those answers are unclear will struggle to hit elite performance regardless of their deployment infrastructure.

The ritual GoalPath replaces

The typical DORA measurement program looks like this: someone in engineering leadership decides the team needs metrics, they instrument a data pipeline, pull data from GitHub and Jira, build a dashboard in Grafana or Tableau, and run a quarterly review.

Three months later, no one is looking at the dashboard. The quarterly review doesn't change anyone's behavior. The data is accurate but not actionable.

GoalPath doesn't replace DORA instrumentation for teams that need it. If you have dedicated SREs and are deploying fifty times a day, you need proper observability tooling. But for product teams of five to twenty engineers who just want to understand whether their delivery health is improving or degrading, GoalPath's flow metrics page gives you the signal you need from work your team is already doing.

No additional instrumentation. No separate dashboard. No quarterly review cycle. The metrics update as items move through your workflow, and the weekly progress report surfaces the trends automatically.

If something is wrong (flow time is climbing, bug ratio is up, five items have been stuck for three weeks), the Insights page shows it. You can act on it in this week's planning instead of discovering it in a quarterly report.