How Platform Engineering Enables the 10,000-Dev Workforce
About one in four developers work overtime at least half of every month, according to a survey of devs and engineering leaders at large and mid-sized organizations.
Perhaps not surprisingly, burnout is the reason cited by more than half of the developers in the survey as the reason their colleagues leave their jobs.
And even if they stay, 45% of developers said they don’t have enough time for learning and development — something almost required by the complexity of DevOps tooling. (The study also found that developers use an average of 14 different tools in their work.)
The State of Developer Experience 2024 report, a survey by Harness, a software delivery platform, queried 500 engineering leaders and developers at U.S. and Canadian companies with 250 or more devs.
The report, released in February, pointed to some of the usual suspects cause all that overwork for developers: lack of automation for time-wasting toil, the confusion and lack of standardization that accompanies too many tool choices, and insufficiently mature DevOps processes.
The result is too many frustrating bottlenecks for developers attempting to build and deliver software. In fact, 60% of companies in the survey said they are releasing code on only a monthly or quarterly basis.
For multinational enterprises working at scale — airlines, banks, telecommunications companies, and so on that employ thousands or tens of thousands of developers — that’s a woefully slow cadence. And due to those organizations’ size, the cost of all that toil, complexity and frustration multiplies exponentially.
In such organizations, platform engineering offers a compelling solution to enable business competitiveness in a way that truly enhances developer experience — commonly called DevEx. For instance, a reduction in unnecessary toil through automation, the report found results in a 37% improvement in developer productivity.
Developers Feel the Pain of Toil
A big problem faced by developers, according to Jyoti Bansal, founder and CEO of Harness, is that they’re not spending enough time on the work that matters most to them and their organizations.
“If you look at the data, about 40% of software developer time is toil not related to writing code,” he told The New Stack.
Instead, nearly 30 million developers worldwide, he said, are struggling with “toil related to what happens after writing code — deploying, building, testing, security, governance, compliance.”
Developer toil is any repetitive, manual and/or low-value work. One of the pillars of DevOps is automation. The Rule of Three says that since it is three times as difficult to create reusable content, wait until the third time you’ve done the same thing and then automate or refactor it.
Internal developer platforms like Harness, Bansal contended, automate and simplify most of that developer grunt work.
“There’s a lot of talk about how do you make developers more efficient at writing code,” Bansal said, but “there’s a lot of what needs to be done after writing code as well.”
And, as we’ve previously written, writing code is what developers want to do most often anyway, so it’s best to concentrate on optimizing the other aspects of the developer role that distract from that. Unfortunately, so much of generative AI for devs focuses on code generation.
The Cost of Tool Sprawl
There’s a lot of waiting around in the life of an enterprise developer, according to Harness’ report.
“If you’re a developer who wrote code and you’re waiting 40 minutes for your code to build, that’s 40 minutes mostly wasted,” Bansal said.
Yes, you could risk context switching to another task, but that can lead to context loss if an hour later you find that something went wrong with that first build. “You want to finish the task you were on before you move to the next task,” Bansal said. “So can we speed up that 40-minute build and reduce it down to 10 minutes?”
It’s the same, he pointed out, when code is getting deployed. Devs have to be on-call to troubleshoot, which can have them sitting around for two hours or more. In the end, this tooling breadth leads to a whopping 97% of developers responding that they context switch because their tools are from multiple vendors.
On top of this interruption of flow state, developers don’t have tight enough feedback loops and are bearing the weight of the ever-growing cognitive load. The result is a measurably poor developer experience. And it’s worse for newcomers: 71% of those surveyed said it takes at least two months to get onboarded at their respective orgs.
In addition to this, the same developer experience survey by Harness found that it takes more than a week for developers to learn new DevOps tools. And about 60% of respondents said it takes at least a week to build internal tooling.
Platform Engineering’s Consolidated Approach
One of the main goals of platform engineering is to rein in the sprawl of developer tooling, by providing a “golden path” for devs with a finite number of tools.
The Harness researchers did the math and found that consolidation of tooling through a platform engineering strategy could have, on a team of 1,000 developers at $100,000 salary per head, an impact of:
- Fifty-three percent increase in developer productivity.
- Fifteen percent reduction of time spent on deployments.
- A potential 158,000 working hours are saved annually, which translates to a savings of $7.9 million.
The results of this and several other developer experience surveys is a demand to improve DevEx.
“It’s pretty eye-opening how much work is needed to improve developer experience,” Bansal said. “That their experience matters in your bottom line.”
Add to this, the dev tool sprawl is causing organizational subscription fatigue and exposing your data and risk across a plethora of platforms.
What’s Slowing Developers Down?
The complexity and lack of automation slow down the overall speed to deliver software.
Of the companies surveyed, 60% are releasing code on either a monthly or quarterly basis. These companies are scoring among the low to medium DevOps maturity level DORA metrics, at least for deployment frequency.
Reflecting on their current employer, developers and engineering leaders in the Harness survey said why they are releasing at this slower cadence:
- Forty-four percent of respondents said testing code end-to-end isn’t fast or efficient.
- Forty-two percent said deploying code to production isn’t fast or efficient.
- Sixty-seven percent said developers are waiting over a week to complete code reviews.
- Forty-two percent of developers feel they cannot release code to production without risking the introduction of failures.
- Thirty-nine percent of developers experience failures at least half the time.
- Thirty-two percent of companies lack high unit test coverage.
- At 28% of companies, it takes at least a day to build and test artifacts.
- Developers still run manual rollbacks at 67% of companies.
Lack of Automation Hurts Security
A concerning 40% of developers surveyed by Harness don’t feel their organizations enforce good security and governance policies across the software development life cycle (SLDC). To further exacerbate the problem:
- Forty-one percent of dev teams don’t have automated security and governance policies.
- Forty-two percent of dev teams don’t have robust identity and access management policies.
And while roadblocks are put up with safety in mind, security and governance are often seen as slowing feature teams down. Speed shouldn’t come at a cost of security, but rather security is a conduit for speed.
Progressive delivery techniques are commonly built into an internal developer platform like Harness and limit the blast radius of change. CI/CD, feature flags, resiliency testing and security testing all help you fail and recover faster.
In a mature platform organization, Bansal said, most security checks should be automated so developers aren’t “blocked [waiting] on someone to approve something or someone to review something.”
A Case Study: American Airlines
Making developers in a sprawling enterprise more efficient, Bansal said, is not about asking them to work harder. It’s about removing bottlenecks and eliminating or speeding up inefficiencies.
Consolidation of tooling with a platform engineering strategy is definitely part of the solution.
When in 2022 United Airlines was planning to scale its business more than ever, it aimed to move 80% of its workloads to Amazon Web Services, which included a migration from monolithic to microservices architecture and a move from legacy tooling.
“We were pushing our teams to migrate to the cloud, but that has a chain of dependencies, and developers had to move much faster,” said Ratna Devarapalli, director of IT at United Airlines.
By moving to the Harness CI/CD platform, including reusable templates, automated pipeline generation and deployment optimization, the airline was able to increase automation and self-service deployments.
“We were able to give the governance policies to the developers and create the guardrails we needed,” Devarapalli said. “Harness gives us a platform rather than just a DevOps tool.”
This resulted in 75% faster deployment times. And a significant reduction of context switching.
“If it’s a fully automated pipeline with all the checks and balances built in with automated rollbacks and automated verifications, then you don’t need to be,” waiting around for the build to succeed or not, Bansal said. “You want things to fail. You want to deploy more frequently. But you want failure to not impact anything.”
Measure to Improve
Want to know what’s blocking your developers? Ask them!
Developers’ frustrations are typically a good place to start measuring developer experience. But when you are trying to do that with tens of thousands of developers, trusting their instincts may not scale.
How do people even know how much time they spend doing any given task? Bansal asked: “The software engineering industry doesn’t have good ways to measure things.”
Harness has a Software Engineering Insight module, which allows teams to discover bottlenecks in their software development life cycles, assess team productivity and measure developer experience. This feature has over 200 integrations, bringing together data at the team and issue-level from sources like:
- Jira tickets.
- GitHub commits.
- Jenkins builds.
- Harness CD deployments.
- Incident management tooling.
Instead of measuring individual developer productivity, Bansal says his platform works to measure the quality of the engineering process.
“A data warehouse of every single thing that goes on in your SDLC,” Bansal called it. “You get a full timeline of everything that goes on, for different projects, for different applications, for different business units.”
From there, the module offers cross-organizational visibility with:
- Productivity scores.
- DORA metrics.
- Agile metrics.
- Investment analysis.
- Goals tracking.
- Industry benchmarking.
Some of the more interesting developer productivity metrics, according to Bansal, include the lead time, and tracking from when a ticket is taken in Jira.
This timeline tracking also offers context for bug tracking. And it connects the product to the reality of engineering.
“Someone says: I need this feature and by the time developers pick it up if it’s weeks and weeks or months of delay,” Bansal said, that could be a signal to investigate blockers.
“Once the developers pick it up, and by the time it’s built, what happens in there is how many code commits were needed, how much things were needed for [pull request] reviews — a developer does a change, and someone has to review it. And that’s two days wasted.”
The build can likely be reduced with tech, he said, but don’t overlook the process. If it takes two days for a peer review, do you need to free up more senior developers?
And remember not to measure just for speed, he continued, but for quality and the impact in production too.
The sociotechnical process of software development isn’t as linear as we think, Bansal said, but it is measurable: “You have to flip the whole narrative from measuring the developer to measuring the process and tuning and optimizing the process.”