Seems like everyone is clammoring for DevOps folks. Do they really know what they want? That term means all things to all people. Some thoughts.
What is DevOps?
Just about six years ago I wrote a blog How To Do DevOps that is as accurate today as it was then. Boil it all away and you need to align the interests (and reward structures) of Operations and Development, and you need to treat Operations as if it was a software problem. Three years ago I blogged about Cloudy DevOps and how it’s all a process. That’s still true too.
But conversations with a few folks recently have got me thinking more about this. It’s good to get some review, and some basic agreement in place.
DevOps and CI/CD are answers to a business problem. Consumers expect both web and mobile software for everything. Uber shows you how far away your ride is, live on a map. FedEx shows you every step of where your package is, and when it will arrive at your doorstep. Kaiser Permanente shows you test results online the day after giving blood. Every company, regardless of their product, is becoming a software company. New features and especially security fixes are expected to occur frequently. The approaches to building software from even 10 years ago no longer meet the expectations of consumers. DevOps has arisen as is a collection of tools and processes to reduce the time and cost to create new software and then deploy and operate that software. Continuous Integration (CI) automates the merging of new software code and the associated testing. Continuous Deployment automates installing and configuring software and software updates.
Most folks I hear talking about DevOps are mostly talking about CI/CD. Which is short-sighted. But let’s run with that for now. Despite what many vendors convey in their marketing, DevOps is more than just tools. To accomplish the business goal, it requires a combination of technology, process, and people.
Technology
The best technology for DevOps depends on the technology stack in use, but also has implications on the processes and the people involved. Jenkins has been a mainstream CI tool for years, but is less well suited for newer serverless systems, or for container orchestration systems like Kubernetes (k8s). k8s has changed the operational posture completely by automating much of traditional “operations.” This spawned a new set of tools (helm charts for deployments, for example) and new products (Flux, Argo). Similarly, “observability” has overtaken simple monitoring, with Prometheus leading to Thanos and the OpenTelemetry efforts leading to Honeycomb. The rate of change in these technologies is significant.
Think about it. Only 11 years ago OpenStack was the darling of many Enterprises. Only 5 years ago VCs were still investing in Mesos/Marathon platform companies. And in a few years, k8s will be replaced with something newer and better. Gasp! Can it be? Yes, the cutting edge will replace k8s. More on that below. The important thing to realize is that all of this changes, faster than you want it to. As changes continue the “right” CI/CD/DevOps technology will change again. Expertise in these technologies requires not just learning, but actual doing. The smart Technology Leader will realize that they really need folks with experience migrating between technologies. Hands-on skills with any one technology have a short half-life. Understanding of first principles and the ability to learn is what an organization needs. Beware teams and Engineers who learned it once and want to coast. This game is the fastest changing game in tech if you account for the scope of knowledge required. Hardware is easy even though it changes fast, because at least it’s limited in scope.
I do think there’s plenty of room for new technology here. Recently I blogged about on-prem and cloud CI/CD. Having a declarative way to create/define/configure a mixed in-prem and cloud environment, that is immutable like k8s is, seems to be the way to go. But it feels like we need to let some ML (or just smart defaults) define the hardware. We should be able to describe our workloads and let smarts figure out what basic hardware deployment is needed. But I digress. Some of what those new technologies will need to address are the issues I raise below.
Process
The purpose of DevOps is to reduce the time it takes to obtain the business value from the software. Having a CI/CD solution that can deploy multiple updates per day has limited value if the development team cannot make meaningful changes that often. Many companies try to become “agile,” but how agile are most teams? Many try and only accomplish “mini-waterfall” delivering new software at the end of two or four week sprints (or worse). The agile promise is more than that. True agile allows fixes and improvements to be deployed multiple times per day, perhaps thousands of times per day. User stories can be accepted and deployed independently and atomically without waiting for some arbitrary “sprint boundary.” Appropriate automated testing can validate the work in near real-time. Changes can be deployed and A/B tested in production, widening the deployed footprint automatically, or rolling back changes that indicate a problem. There are technology tools that enable all of those things, but the development processes - and people - must support that in order to realize those benefits.
So when you think about DevOps and want modern CI/CD and do multiple releases per day, could you? Can your process support that? And track it? Probably not. There’s a whole pile of process you need to put in place. And skills, and more tools. Which leads us to people.
People
Processes are performed by people. Software is written and then deployed/operated by people. Those people need the right skills. Moving to cloud requires a substantial new set of skills, but serverless, containers, container orchestration, distributed architectures, and consuming managed services have exponentially increased the required skill set for the product development team. New tools to track the work? New skills to learn. And yes, like k8s, those too will change. In a few years there will be new tools.
Even more important than skill-set are the mind-set of the team. As companies go global and become more dependent on software the expectations on the staff change. DevOps (and especially DevSecOps) merges what used to be disparate, separate skilled teams into a group that collectively become responsible to keep things working and make things better. “Normal working-hours” white collar knowledge workers from a few years ago are now on-call. Operations staff are now expected to at least read code, and perhaps even debug it. People change more slowly than technology. The expectations, culture and compensation of the team must be managed differently.
Or do they? I strongly believe that DevOps means that the interests of Operations and Development need to be aligned. There’s another way to solve the problem than having developers learn ops and Operators learning development: Google’s SRE model. I won’t dig into that now, but the whole field of “Resiliency Engineering” and “reliabililty Engineering” really boils down to being able to have high feature velocity AND high uptime at the same time. The SRE approach has two key ideas to it: a team of Engineers who specialize in these things that acts as an overlay organization, and more importantly, a rock-solid commitment to a “budget” of reliability. When you want to make changes and you don’t have room in the availability budget, Development teams work instead on creating more budget through debt reduction, better designs/abstractions, ops tools, etc. There’s whole books on this, so I won’t go on. If you don’t think you can staff your Dev and Ops teams with folks who can move towards each others skills, you probably need to be thinking about going SRE.
Conclusion
Accomplishing the business goal of delivering software faster and cheaper requires leadership in all three of these areas. The technology will change much faster than the processes, and the people will change slower still. Expertise in today’s technology is vital, but it is also critical to plan for the inevitable change in those technologies. The leadership you seek may be more than a single person or role. You may need one or more senior technologists fluent in the low-level technologies today, as well as a seasoned leaders who have managed these kinds of transitions before and can look around corners to see the problems of tomorrow.
At the end of the day, DevOps, CI/CD, SRE are all just names for things. The business goal is to deliver more features in the same amount of time, and “always work.” That’s what consumers have come to expect, and that’s what every business needs to try to deliver. Don’t drown in the alphabet soup. Keep your mind on the business goal, remember that it’s all going to change faster than you want it to, and get the best people you can. And yes, this flies in the face of what most of your tool vendors are telling you. My favorite question for them is to have their Engineering Leaders come tell me how they do DevOps. If it’s not better than how you are doing it, what makes them think their tool is all that great?