The software world gave up too soon on measuring development productivity, deeming it impossible. A few years ago a new wave of research arrived that proved otherwise. After discussing the foundations for this measurement approach, this article will share some practical tips for instrumenting a modern high-performance and continuously improving development organization.
Foundations for measuring software engineering
It’s no wonder measuring software development productivity hasn’t worked so far. The discussion easily gravitates toward measuring individual performance, dumbing it down to a single number, or getting the organization to game the metrics to receive recognition.
In the 2018 book Accelerate, Dr. Nicole Forsgren, Jez Humble, and Gene Kim took a scientific approach to find measures that would lead to increased organizational performance. Forsgren has continued advancing the book’s ideas and recently published the SPACE framework as an approach for thinking about measurement holistically. Developed by Forsgren, Margaret-Anne Story and several Microsoft researchers, the SPACE framework considers:
Satisfaction and well-being
Performance
Activity
Communication and collaboration
Efficiency and flow
The idea is quite simple: measuring developer productivity is easy to get wrong, but that’s not a reason to close your eyes to the information. By taking a holistic approach, ranging from flow of work to quality metrics and surveying the developers, you gain understanding to accelerate the team’s learning.
Another classic book that focuses mostly on the flow of work is the ground-breaking Principles of Product Development Flow by Donald G. Reinertsen. The key themes revolve around building an economic model for understanding software delivery, managing queues, reducing thebatch size and limiting Work In Progress.
Great organizations build feedback loops to support decision-making and move faster than the competition. It’s up to you to evaluate what to measure in your own context.
Having built several high-performing product development organizations, I’ve ended up implementing something like the SPACE framework on several occasions. For me, it’s easier to categorize the dimensions into three parts: Impact, Flow, and Health. Let’s first discuss what I mean by Impact.
Impact – are we driving business outcomes?
This is the most important question: what are we getting in exchange for the investment in product development? The answer should not be a list of features but rather your business and its key metrics moving in the right direction.
In practice, Impact metrics are often owned by Product Managers, and companies that do goal-setting with something like Objectives and Key Results (OKRs) use Impact metrics to align teams.
In a modern product development organization, agreeing on the objectives and outcomes is a way to empower the teams, as Marty Cagan and Chris Jones write in the 2020 book Empowered. Traditional organizations control team backlogs from the top at the feature level, while modern organizations empower teams to adjust their own backlogs as long as the team keeps business objectives in mind.
Feature adoption
The starting point for most software products is to track feature adoption. You might use products like Amplitude, Pendo, or Heap to track the events or maybe something like Segment to relay events to your data warehouse.
You’re going to appreciate a flexible solution since you don’t know all your research questions upfront. For example in our business (of building SaaS tools for data-driven engineering management), we like to focus on metrics about onboarded teams rather than individuals, which would not be well-served by something like Google Analytics.
Feature adoption doesn’t automatically guarantee growing revenues and profits. It’s a reasonable assumption that people who use your product are likely to pay for it, but this assumption also deserves a critical look. Once you have this data, it’s easy to analyze how well different features correlate with successful onboarding.
Comparing feature adoption between paid and non-paid users
Comparing feature adoption between paid and non-paid users
Investment categories
Another way to think about Impact is through the investment you’re making. Do you understand how much of your engineering effort is going to your company’s top priorities, used for keeping the lights on, dealing with technical debt, etc?
The reality is that sometimes it’s difficult to attribute changes in business metrics to individual actions. It is more effective to understand how much you’re investing in specific priorities and see if that investment is getting you what you want.
Investment distribution between categories of work
Investment distribution between categories of work
Business-specific measures
You’ll need to come up with a leading indicator that helps you understand your own business model faster than you could with lagging indicators such as revenue and profits.
Slack famously established that teams that have sent over 2000 messages have tried the product properly, and 93% of them are still using Slack today. This kind of proxy could be useful for developers responsible for optimizing the onboarding experience, but it might also be a company-wide KPI to align sales and marketing.
A SaaS business could focus on enterprise customers to drive up their Average Revenue per Customer. Yet the numbers lag due to an existing customer base and a successful SMB business. A better option might be looking at conversion rates and churn for companies of a certain size. Focusing on these metrics might lead the team to focus on user management, security & compliance, etc.
A team building a robot for picking up products in an e-commerce fulfillmen center might want to optimize for the error rates of picking the wrong product or dropping the product.
Some metrics will stay intact for years while others may serve a more short-lived initiative.
Reinertsen suggests that “if you measure one thing, measure the cost of delay.” In the absence of a more exact method of measuring the cost of delay, these business metrics are likely your best bet.
Flow – are we delivering continuously without getting stuck?
The unfortunate reality about complexity in software is that if you just keep doing what you’ve been doing, you’ll keep slowing down. When starting a fresh project, you’ll be surprised by how much you can accomplish in a day or two. And in some other environment, you could spend a week trying to get a new database column added.
Understanding the flow of work is critical because many of the issues are systemic. Even the most talented developer might not have the full picture of how much time is being wasted when work is bounced between the teams, half-completed features are put on the shelf as priorities change, or all the code gets reviewed by just one person. It’s easy to think that you’re solving a quality problem by introducing code freezes and release approvals, but in reality, it might not be worth it.
The of work is often measured in terms of cycle time. The term cycle time comes from manufacturing processes, where cycle time is the time it takes to produce a unit of your product, and lead time is the time it takes to fulfill an order (from a request to delivery).
In software development, these terms are often mixed. For most features, it might not be reasonable to track the full lead time of a feature, as in time from a customer requesting a feature to its delivery. Assuming that the team is working on a product that’s supposed to serve a number of customers, it’s unrealistic to expect that the features would be shipped as soon as the team first hears the idea.
Therefore, this article will discuss Flow in terms of cycle time, cycle time for issues, deployment frequency and deployment error rates.
Cycle time
When talking about cycle time for code, we’re talking about the time it takes for code to reach production through code reviews and other process steps. Sometimes it’s called change lead time.
Cycle time is the most important flow metric because it indicates how well your “engine” is running. The point is not to worry about slow activity but rather the periods of inactivity.
When diagnosing a high cycle time, your team might have a conversation about topics like this:
What other things are we working on? Start by visualizing all the work in progress. Be aware that your issue tracker might not tell the whole truth because development teams typically work on all kinds of ad-hoc tasks all the time.
How do we split our work? It’s generally a good idea to ship in small increments. This might be more difficult if you can’t use feature gates to enable features to customers gradually. Lack of infrastructure often leads to a branching strategy with long-lived branches and additional coordination overhead.
What does our automated testing setup look like? Is it easy to write and run tests? Can I trust the results from the Continuous Integration (CI) server?
How do we review code? Is only one person in the team responsible for code reviews? Do I need to request reviews from an outside technology expert? Is it clear who’s supposed to review code? Do we as a team value that work, or is someone pushing us to get back to coding?
How well does the developer know the codebase? If all of the software was built by someone who left the company five years ago, chances are that development is going to be slow.
Is there a separate testing/quality assurance stage? Is testing happening close to the development team or is the work “handed off” to someone on the outside?
How often do we deploy to production/release our software? If the test coverage is low, you might not feel like deploying on Fridays. Or if deployment is not automated, you won’t do it after every change. Deploying less frequently increases the batch size of a deployment, adding more risk to it, again reduce