We are not technical. When we work with teams, our primary focus is the relationships and culture. At the same time, we need to be able to speak the same language with our teams who sometimes are very technical. How do we go about it?
In my recent live stream, I wanted to take a bit of a different approach and share some things that I learned about DevOps. I wanted to break it down from a more nontechnical point of view.
DevOps is a technical approach, and there are a lot of engineering practices that are associated with that, but in the end, it is something that we definitely should know if we work with developers.
I chose the DevOps Handbook because I already had it. And also because the authors of the book are technically the creators of the concept. So if someone knows what DevOps is supposed to be – it’s them!
Watch the full stream or keep reading to learn more about what I learned from the DevOps Handbook that I took as my number one source of truth about DevOps.
What is DevOps?
The DevOps Handbook doesn’t really give us a one-statement type of answer to what DevOps means.
But I found this definition that I believe reflects it well. This comes from the AWS website:
“DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes. This speed enables organizations to better serve their customers and compete more effectively in the market.”
So in simple terms, DevOps provides us with a set of cultural and technical practices that allow organizations to build high-quality products that fulfill customers’ needs.
The key part to highlight here is that DevOps includes both technical AND cultural (or behavioral) side of things.
Why I really want to highlight this point here is because if you just search online for DevOps courses and certifications, you will be greeted with a lot, even mostly, technical stuff.
Such as shown on this image:
And this is basically the same as saying Agile implementation is about Jira, Confluence, GitHub, and Slack.…
So, when you do your own research on DevOps, remember to also look into the cultural practices.
I actually got so confused, that I decided to reach out to some cloud engineers I worked with for some advice and clarification.
I asked:
And I got my suspicions confirmed with this message:
Obviously, technical people are focusing a lot on the technical stuff. But, we are not technical. We can focus on the cultural stuff.
We are trying with this framework to help the organization build better products that customers want to buy, customers want to use. And also we are considering the well being of the employees. It’s not only just kind of delivering products and burning out everyone who is working for the company. And so DevOps comes back to not only technical practices but to cultural practices as well.
DevOps is kind of split into three ways, the underlying underpinning principles of DevOps.
- The first way are the principles of flow.
- The second way are the principles of feedback.
- And the third way are the principles of continual learning and experimentation.
Flow, feedback, continuous improvement. Sounds very familiar. Again, we know what it is. We have already seen it.
That infinity sign that we saw earlier – you won’t find it in the DevOps Handbook, actually! I’m not sure where it came from, I didn’t research it. But I was trying to find the sign that they DO have in the book to describe DevOps, and I couldn’t find it online. I had to take a picture from the book itself:
Business here basically means development, the product team. The product team who are working on building the product. On the other side, we have the customer. And the customer is often represented by operations. Basically, people who interact with the customer, who are getting the complaints or requests coming from the customer to them. And often, those two are kind of sitting on two different sides.
They’re, in a way, kind of working against each other because development, business, the product side, they want to deliver as many features as possible. They want to produce as much as possible. On the other side, customers and ops want to have a stable system.
And so that it creates kind of a tension between the two. And so what they want to actually say with DevOps, we’re trying to connect them with the first way and the principles of flow.
The flow of work comes from development and ends up in ops.
Then, we have The second way, which is feedback. And that creates a feedback loop between the customers, operations, and development.
And then inside of it, we have the third way, the principles of continuous learning and experimentation. This image is actually the representation of DevOps.
Cultural and technical practices of DevOps
When we are talking about these three principles, we have cultural practices or cultural changes that we want to create in the organization, and then we have the technical practices that support those cultural changes.
But you can’t go there without cultural changes, actually. So you may implement a practice. But if inherently your team or your organization is not well aligned with what you’re trying to implement, it’s not going to work.
Think about it in the same way as with Scrum. If we just implement Jira, GitHub and Slack and call it Scrum, we won’t be getting any benefits of the framework and things will not work, even if we have the right tools in place.
Yes, those tools, those technical tools that we can implement to help us manage our workflow, create visibility in our product backlog. But if behind that the culture is not actually supporting the principles of transparency, visibility, and collaboration, then Jira becomes a very bad tool and everybody hates it.
And so that’s kind of the same way. And so the technical practices are important, but we also need to work on that cultural aspect first.
Cultural practices of the principle of flow
With the principles of flow, we’re trying to create visibility around everything that we do, our processes. We want to create visibility in the flow of work.
We want to know what actually happens from where we start the work, for example, when the customer comes with a request, and until the moment we give that work back to the customer. We’re trying to understand what is happening in this process. I think this is where Scrum falls short.
We look at the Scrum team, and it’s just part of this flow, part of this workflow. So the Scrum team finished, but there’s behind-the-scenes of all of that work.
In DevOps, we’re trying to look at the whole cycle. That’s about visibility.
The principles they have here start with limiting work in progress. so that we can get items from one side to the other as fast as possible instead of having 10, 000 things in progress.
Sounds familiar? Sounds like this principle comes from Kanban.
Then another one coming from Kanban – reducing batch sizes. So that’s about splitting work into small pieces of value where we are able to complete something. Maybe it’s a very small value on its own, but we’re able to get it done, and get it to work with the rest of our system.
Small batches, the same comes from Agile since the point is to deliver frequently.
We’re reducing the batch sizes to deliver more frequently to the customer.
The next one is reducing the number of handoffs. And that’s really about optimizing the workflow. For example, if we have a requirements team, they first need to approve all of the requirements. Once this is done, they hand it off to developers.
The developers complete their part of the work, and then they hand it off to the next team that is going to release it. That creates a big need for coordination, a lot of problems, and potential risks for things not going the way we planned. And so the principles of flow tell us we need to reduce the number of handoffs and have as much power stored within our dev team as possible.
This comes back to the concept of cross-functionality. We heard about that in Scrum, and we heard about that in Agile principles. Nothing is new.
Then they talk about continually identifying and removing bottlenecks. And that comes from, once again, Kanban methods from optimizing the workflow where we’re trying to understand which part of our process is becoming a bottleneck.
The bottleneck is really identified by the part of the process that slows down and cannot get through as many items as we want as any other steps in our process.
Question from the audience here: Does DevOps use events similar to the Scrum events?
Interestingly enough, there are no events in DevOps, apart from the blameless postmortem. Apart from that there isn’t a process explaining how to run iterations. There are no events related to DevOps specifically.
And then the last one that comes also from Kanban and Lean Management, is eliminate hardships and waste in the value stream. So, in simple terms, stop doing things that don’t bring value and that create misery within the teams and within the organization.
And what I really like here is they actually define what waste is. Because we often say, we need to eliminate waste in the organization, in the team, and in the work that we do.
What’s waste? Well, it is the use of any material or resource beyond what the customer requires and is willing to pay for.
So if we are building something that nobody wants, for example, nobody cares about, this is waste. If no one is ready to pay for it, if no one is going to use it, it’s waste, we have to eliminate it. That comes back to things like the product backlog if we’re using Scrum.
Cultural practices of the principles of feedback
The next part would be the principles of feedback.
First, make quality everyone’s responsibility.
You know, that meme:
It worked fine in dev. Ops problem now, right? So this is the kind of relationship that is often created between development and operations.
Where a developer says “I finished it, it worked fine. Now I don’t care if it’s not working with the rest of it. So operations will figure it out”.
And so I think that’s kind of a very good meme that represents that relationship. And so the second principle of DevOps is to actually say, “Hey. We all are working on the same product, so everybody is responsible for the quality of that product, including you, developers, and including you, operations”.
We’re trying to make sure that everybody actually has the same goals set. And the same responsibilities, so everybody is accountable for that quality.
Now, we are talking about making sure that people are not afraid to actually provide feedback, like don’t shoot the messenger. Another thing, is that we need to have a safe environment for failure so that if people are finding issues or failing or not finding the best ways to do something, they’re not afraid to come and talk about it, to bring up those problems.
The next one is to see and report problems as they occur. Obviously, if we have a safe environment, we want people to talk about any kind of issues as soon as possible and not wait. When there is a problem and we see it, we need to report it immediately.
Whenever there is an issue let’s stop, let’s reassess, and make sure that we solve this problem before we actually continue. And that is another point here in the principles of feedback: swarm and solve problems to build new knowledge. That basically means is, if there is a problem, everybody, especially if it’s a big problem, everybody stops doing whatever you’re doing. We’re all going to come together as a team, fix that problem, and then we’ll continue with the rest of the work to do.
I think that definitely is something that helps. For example, right now, I’m working with a team that has that kind of mentality where someone would get stuck they immediately talk about it saying “I’m having trouble with this”. Then the team immediately says “Okay, let’s meet right after the daily scrum and figure it out together”. And that helps them fix those problems faster and get back to the product work again.
A lot of this stuff is common sense.
Then we have a practice of finding and fixing problems in the area of control as part of the daily work. Basically, that means that you should avoid escalating several levels up to have someone else, an architect from another team, to fix the problem. Instead, what you’re trying to do is, to have the people who are the closest to that work to fix it because they have the most knowledge about it.
And the last part is to prioritize nonfunctional work alongside functional work. This means you should create a culture or a process where operational and nonfunctional requirements are prioritized as highly as user features. So that’s kind of the, the main point in here and there is something, what I talked about in the tech debt video.
In that video, I said, technical debt is as important as features, especially if it is something that can kind of bite us three months down the line. We want to fix that earlier before the disaster happens. So it’s the same thing that says, make sure you actually optimize, not only just we’re delivering features, but we are also making sure that we deliver high quality.
Those are the principles of feedback. Pretty straightforward, right? How to create an environment, a good environment for feedback.
Cultural practices of the principles of continual learning and experimentation
The third one is the principles of continual learning and experimentation, i.e. continuous improvement. We want to create a culture where we are able to try new things and fail.
We talked about failure already. Some of the things we want to do here, are to make sure that in the organization itself, we have a culture of learning, and trying out new things, and we have a culture where it’s safe to fail. Again, we’re coming back to those points.
Leaders reinforce the learning culture. So leaders need to be on the same page with everyone. It should come from leadership. Yes. We’re trying to build both sides from the bottom up, but from top down too.
Leaders need to encourage people to try new things and when failure happens, they need to be there to support people, to help them learn from that failure, and not be that manager who is only pointing out to people making mistakes as something negative.
Coming back to that learning culture, under the same, the same principles, we need to institutionalize the improvement of daily work. Basically, reserve time to fix and improve. You know, kind of like retrospectives.
They don’t have it as an event or a meeting in DevOps, but they clearly state that you must make sure that you reserve the time to evaluate how things are going and find ways of improvement.
Let’s talk about firefighting that often happens in dev teams. Usually, a team enters a vicious cycle where we’re seeing the problems that we need to fix but we never have time to do it. We always say we’ll fix it when we have time, later. We never find time to do it.
That is the problem is that that basically brings us to that firefighting mode where there are fires everywhere. And we never plan ahead for removing the hazards before the fire starts. And so that’s kind of what they’re saying. Instead of trying to always fix one problem in one corner when the fire starts, instead, we are looking at our whole process and we’re trying to eliminate the risk of fire by improving our processes in advance.
Plan for that process improvement. That comes back to our product backlog and prioritizing nonfunctional requirements. It all comes back to the same concepts that make a lot of sense already.
Another important cultural practice is to never punish for mistakes. Even more so, take people who make mistakes, and make them experts in the area where they made mistakes because now they know it better, they learned from heir mistakes.
Elevate people who make mistakes because it means that those people are taking risks and it means that they are learning faster through those risks, through their mistakes.
Every mistake is an opportunity for learning. I have a video on the failure. The same thing, coming back to how do we encourage failure.
Transform local knowledge into global improvements.
The authors of the book kind of encourage a lot of documentation. Say we made a mistake, we fixed something and that needs to be documented in some way so people can find it and learn from it.
Have a sophisticated wiki where you collect all of that information, especially when things are not going the right way. That way anyone can actually find this information, read through it, and potentially find the solution to their problem there.
And the last one here is what they say, continually introduce tension to elevate performance. And that basically means to try to break the system on purpose so that you can identify potentially risky areas in your product and fix that before the problem actually occurs.
We want to kind of test our product, poke at it, and see if there are any gaps that we should fill.
So those are the principles. The three ways. And these are just the cultural practices that I talked about.
And it all starts there. This part is where I think we as Scrum Masters, nontechnical people, leaders can actually help the organization and the team create a successful environment for Scrum. So when we often talk about Scrum, we implement Scrum, we don’t implement Scrum in just a bubble.
We implement Scrum within a huge organization. And so it means that we need to create the environment around it that will make it successful. And so if we follow these principles, if we focus on making these principles a reality for our team in the organization, Scrum will work well.
I think those are just good principles that you can implement alongside any other framework or method you might be implementing right now in your team.
Yeah, and as you can see, this is why I didn’t make a video. I’ve just been talking for like 40 minutes, and I only covered the cultural practices.
We still have the technical practices.
And so, when we go to the technical practices, I guess this is where it can get a bit complicated, if you’re not a technical person. But what I think is important is that you understand what those terms mean and kind of what they represent. So that when you’re talking to developers, with developers, you actually are able to provide the right advice, you’re able to direct them without telling them how to do it.
Technical practices of flow
A lot of these practices come back to CI/CD topics and automation.
It starts with the deployment pipeline. The book talks about having an on-demand, easy-to-set-up deployment pipeline that can be used by the developers. We are trying to reduce handoffs with that where developers don’t have to go and ask operations to create a QA environment so that they can test their work.
But instead, they’re able to do that themselves. So we are making the work much easier for them so that they are able to build whatever they’re building and get it to the end, test it, release it.
That basically means that the developers can easily create development tests and production environments or production-like environments to make their to validate that what they’re working on is actually working.
Then we have automated testing. And so a lot of the time DevOps is associated with just like basically automation. But that’s just one of the technical practices. So we are encouraging teams to automate as much as possible because we want to reduce the amount of time that they are using to test and still elevate the quality and make sure that everything actually works well. And so we often say that Scrum is impossible without automation. This is because if we’re trying to put everything within one sprint, the team is able to complete all of the work within one sprint.
Often what happens, is the teams would spend 15 percent developing and then 85 percent testing. So we’re trying to make that testing part much easier to do so that there isn’t even a question of whether we test or not. We have all of it automated. And obviously, we want to make sure that even though we are adding a lot of that automation, our pipeline is quick to build.
For a few teams that I worked with, it would take them four or more hours to get a build. And then, at the end of the four hours, it might be broken. And so it means that they have just spent four hours waiting and it’s not working. It’s extremely frustrating. And when you are adding a lot of those, that testing, you need to be mindful of keeping it fast.
Then they talk about continuous integration and testing. That is a controversial topic. I think a lot of people say that continuous integration and testing are really bad. And they mention it here as well. So you probably will see that people, not everybody is for continuous integration and testing.
Basically it means that we constantly are pushing our code in small pieces into the pipeline that we build and test continuously. So there are some challenges that are associated with that. Some people don’t think it’s a good idea. In this book, they encourage it.
And then they are talking about making the releases low risk, making it easy for for a team to to release something into the hands of the customer. There are some of the different techniques and practices that can be implemented there, making sure that it is architectured in such a way that it is low risk.
They’re talking about well-defined APIs, and interfaces so that everybody is aligned on the same page when they’re building whatever they’re building. They’re talking about strangler application patterns. That’s a bit too technical so I won’t be able to go into that. Then they talk about automation again, about automating releases, the deployment process we talked about in the deployment pipeline.
They also mention separating releasing into the customer’s hands from deploying the work. So what it means is that we can be deploying in an environment that looks like production, a production-like environment to verify that it works. But we do not give it immediately into the hands of the customers.
Only when we are ready, we make it available to our customers. Some of the techniques they would use are feature toggles, when we push out the work into production, but customers don’t have access to it, because we have basically said that this feature exists in the code, but right now it is turned off, so customers cannot use it.
And then we can start turning it on, and we can start turning it on for just certain parts or certain groups of customers or users. And so we can gradually test it. And if something stops working, there is a problem, we just turn it off. We have a feature toggle, like, we just toggle it off and it’s fine.
And then the other one they talk about, the technical practice, it’s called blue-green deployment.
Basically, it means you have constantly two exactly the same production environments. One of them is the one that the customers are currently using. So it’s active. And then the other one lives and exists. It’s exactly the same as the one that is active, but it’s just inactive.
And so you are building whatever you’re building, you’re delivering it into that inactive system. And then what you do, you just basically turn this one off and turn this one on. Basically you are switching them out and blue-green means which one blue is the one that is sitting behind and is inactive and green is the one that has been used by the customers and customers have direct access to.
And so that minimizes the risk. Because if we deploy something and it breaks, it’s okay. Because the main environment is still working. Customers can still use our product. The only environment we have broken is the one that has been inactive.
With all of these technical practices, we are trying to make that flow as smooth as possible for the developers to get something into the hands of the customers.
Technical practices of feedback.
What kind of technical practices do we want to implement to help us create the feedback loop?
A lot of the things that they talk about in this part is telemetry, which is basically collecting as many metrics as possible and reviewing them on a regular basis.
They talk about a lot of different applications that you can use that will basically pull the information from your deployment pipelines, from your code repositories, and will create graphs, and represent it in an easy-to-read way. They talk about how to set it up.
The idea here is not that we are, looking at 10000 metrics all at once and analysing them separately.
What we’re trying to see are any unexpected variations. So if we have a pretty stable flow of, for example, the number of bugs that are being created every time, then we should be fine. But if we see a spike, then immediately we can say, okay, there is a problem. We need to do something.
And so they are trying to collect as many metrics as possible. So they talk about how to automate this process because obviously, if you make it difficult to collect the metrics, then nobody’s going to do it.
And so they talk about ways of how you can allow developers to insert the code into what they’re doing so that that can be easily picked up by all of those applications to track those metrics.
They talk about dashboards with thousands of different metrics. And they would have code implemented in that telemetry that will just notify teams when something is wrong. When something is out of order, they will get a notification.
Obviously, when you have those metrics, it allows you to understand where the problem is and focus your problem-solving very specifically on the problem area. And so because you have the metric that can show you exactly what you need to look into. And there are lots of different things and examples of what you can do.
And they basically say to measure everything, evaluate everything, and analyze it regularly. Use mean and standard deviation to detect potential problems. So those are some of the things that you want to do that enable feedback
Then they talk about hypothesis-driven development and A/B testing. This allows teams to focus more on that experimentation.
Then the other practices mentioned in relation to the principles of feedback are creating processes where people review each other’s work and coordinate together so that they can increase the quality of the work overall. Some of the easy things that they talk about are peer reviews. Honestly, I haven’t worked with a team that didn’t have peer reviews of their code.
That’s just become such a standard practice. Where one developer writes code, they create a pull request and then they ask other developers on their team to review it and to make any comments on how they can improve it, or maybe they made a mistake, or they didn’t follow the guidelines, whatever that is.
And then they talk about pair programming. Also, another practice that we are familiar with. Pair programming is where two people are working on the same item together. So you have knowledge transfer. They learn from each other. Especially if you have like a senior dev with a junior dev.
They also talk about minimizing or even eliminating change approval processes because they say it slows down development, it creates a lot of frustration, and if you have the right culture in mind, if you have telemetry implemented already, it’s okay not to have the extra approvals because you will be able to detect problems pretty quickly.
Then fearlessly cut bureaucratic processes. That’s one of those things, it is under technical practices, but in a way, it’s also just a normal thing to do if we are trying to implement Agile ways of working. Bureaucracy is definitely not helping, so we should eliminate it.
Technical practices of continual learning and experimentation
Or, in simple terms, continuous improvement. This is where they actually talk less about technical stuff and much more about process stuff. They call it technical practices, but honestly, I think the most technical practices are in the principles of flow, in the first way. Maybe in telemetry too.
The third way in the practice of continuous improvement are enabling and injecting learning into daily work. So it’s about creating a learning culture, here’s where they talk about blameless postmortems. So what they say, if there is a problem, something happened, we fix the problem first, obviously.
And then, right after it has been fixed, while our memories are still fresh, we need to do the blameless postmortem, where we identify what happened, and what are the issues that occurred that brought us to that problem without blaming anyone. Not only that, but we actually make the results of those discussions accessible to anyone in the company.
So anyone who wants to learn from our mistakes can do that. We publish postmortems as widely as possible to the whole organization.
They talk about failures again, so talking about we want to make failure more acceptable. And instead of looking at it as a failure, we look at it as a learning opportunity.
In addition, they encourage to inject production failures to enable resilience and learning. So I talked about it where we are actually trying to break our system to find any gaps and fix them before someone else breaks our system. And they’re also talking about game days to rehearse failures, kind of like hackathons to break our system.
They talk about making tacit knowledge available to the whole organization. So we are trying to make any kind of discoveries, especially when it comes to postmortems, clearly just available for everybody else in the organization so that they can learn from it, too.
They talk a lot about automation, like automating capturing organizational knowledge, for example, chatbots or AI can gather a lot of that information, and maybe create summaries. Automate, well, they obviously didn’t have AI when they wrote it, but now we do.
Automate standardized processes in software for reuse.
If you have very simple processes, so instead of every team creating it on their own, we are able to kind of give it to them. So that library of knowledge is readily accessible to everyone. Create a single shared source code repository for the entire organization so that everybody uses the same libraries, and the same standards, and that basically makes it much easier for the whole organization to really operate. And nobody needs to create their own library or their own way of building the product.
They talk about communities of practice. They talk about creating reusable operations user stories in development. So something that if we know that we need to continuously do something like every month we have a reusable, well-explained work item that we don’t have to recreate it every single time.
We know exactly what needs to be done.
And then they have reserved time to create organizational learning and improvement. Again, reserving time. Institutionalize rituals to pay down technical debt. They actually recommend putting aside 20 percent of the team’s time on technical debt.
So, if you’re doing sprints, make sure that every sprint you take in some of the technical debt work.
Again, these are not necessarily things that are technical.
That’s pretty much what DevOps is about. Of course, there are some more specific technical practices in here.
When we’re looking into what is DevOps online we only find technical stuff. It is kind of sad for me that DevOps became “how to set up cloud environments, deployment pipelines, and how to create automated tests” because there’s much more than just those technical practices. It’s just part of it.
As a Scrum Master, you can use that knowledge to help your teams by creating an environment around them where if they decide to implement DevOps, it will actually work.
When it comes to facilitation we’re talking about blameless postmortem and improvement opportunities like retrospectives.
There are also concepts around reviews, like Sprint Reviews. Even product owners are mentioned in the book.
In conclusion: DevOps is not that far from Scrum
The more I learned, the more I saw the parallel between DevOps and other Agile concepts, values, and principles.
There are many elements of Scrum that are aligned with DevOps. The first way is fully aligned with Kanban method.
Generally, it all comes down to creating an environment where collaboration thrives. Where customer needs are put forward. Where quality is essential.