DevOps: Manage the Cognitive Load

Speed, cost and quality are considered the vectors that you can control. If you are moving to cloud to get the “ilities” that comprise quality, you need to seriously consider Cognitive Load.

Over the last 10 years I’ve seen a lot of different software architectures and systems at a lot of companies. Every team and system has it’s own history, it’s own technical debt, it’s own velocity and set of capabilities. Some teams are staffed more completely than others. Some are farther along the cloud journey than others. Some have people with more current skills than others.

What’s not often discussed or evaluated is the level of “Cognitive Load” the team has to carry.

How I Define Cogntive Load

Cognitive Load is the sum total of all the knowledge and understanding needed in order to do your job. It’s going to include the base eduction that got you to where you are at, and it includes all the new material you need to understand how your job is changing. Some jobs have huge amounts of knowledge needed but don’t change that fast. I used to be a nuclear power plant operator and supervisor on a US Navy submarine. It took two years of training before I stepped onto the boat, and generally it takes another year to be fully qualified. But then it did not change all that fast.

If your job is building and/or operating today’s software then you have both a massive amount of residual knowledge needed AND it’s changing incredibly fast. If your job includes migrating software to cloud, then it’s a double load: you need to know how the current system works, and you also need to understand all the cloud technology needed for how your system will work in the cloud. If your software is written in, say, Java, and you are porting it to golang then it’s a double load: you need to know the Java ecosystem and also the golang tools and ecosystem. If you are moving from VMs to containers, same thing. All of those vectors of change are not additive. They are multiplicative.

Think about that for a minute.

Today, companies are moving towards “Cloud Native” architectures. Which usually means Kubernetes (k8s). Which itself then implies containers. And a new monitoring system, probably oriented around “observability.” And often it includes moving to a “streaming” data injest model, possibly with kafka or Amazon Kinesis or the like. And if you are going there, you almost certainly want to adopt a “DevOps” posture, meaning using Continuous Integration and Continuous Deployment (CI/CD). That means you also need automated, continuous testing, and almost certainly a whole new set of automation tools. Jenkins may have served you well deploying to long-lived servers in a data center, but it’s going to be a poor fit if you are doing k8s.

BOOM. The brain explodes. This is a LOT of material.

Managing the Load

Product management is about planning what your product needs to do in order to be of the most value to customers. It usually includes planning trade offs in the development of that product, usually trying to deliver useful bits of the product incrementally, usually adapting the product as you gain experience and customer feedback. Engineering designs the archicture and builds the product. Management plans the funding, staffing, and work efforts.


Your teams have to learn a lot of material, and there is no “one size fits all” approach to up-skilling your teams. It’s a learning journey. Building the new architecture is a journey too, usually refactoring the software, shifting parts into micro-services, decoupling sub-systems, doing a “strangler fig” approach to untangle your monolith. That takes planning. The learning journey to cope with the cognitive load also takes planning.

Don’t underestimate this. I personally have seen a project get into trouble because the team did not invest enough in how they were going to operate a new technology. If you think you can just adopt something like kafka and it “just works” then you are going to have a bad quarter. You have to plan how your teams are going to learn all the new tech. You have to plan for the cognitive load.

Optimization and Choices

Back to the vectors: speed, cost, quality. The old saying is “pick any two.” Quality is the sum of all the “ilities” (reliability, sustainability, operability, scalability, etc.). Adopting new cloud technology is to get you those “ilities.” But clearly doing so will slow you down, and cost you more.

You do have some choices. You can choose how fast you adopt new technologies, and which technologies. If you move to containers, do you really need to jump right to k8s? Can you consider Docker Swarm?. What about Hashicorp Nomad instead? Both have substantially lower learning curves. Can you spread the change over a longer period? What really is driving your need to make the changes in the first place?

Sometimes you just have to bite the bullet and adopt change. But consider the time to learn, training costs, and most importantly learning to OPERATE the new technology. Does your team know how to troubleshoot it when it fails? How do you recover? Have you actually practiced that?


Teams have a certain amount of energy they are able to deliver. Some companies try to push teams harder to get more energy delivered per unit time, but ultimately the energy a team has is finite. Some of that energy will be spent learning the new technologies. You may be able to hire some new talent who already have the skills, but even that takes time. And money. There is a talent war going on. In all likelyhood you will need to up-skill your existing teams. And that means making careful choices about how much Cognitive Load you are taking on.