PROCESS, COMMUNICATION, MINDSET — THE COMPONENTS OF A SUCCESFUL DEVOPS CULTURE

In the technical literature, it is very often emphasized that DevOps is more than just a collection of tools and methods — namely, a new kind of organizational culture. And indeed, without the cultural factor, the convergence of development and operations remains an empty promise.

But what does “culture” mean in this context, and at what levels can it be defined, evaluated, and changed? And why does a successful DevOps transformation require an entire cultural change right away?

This article provides answers to these questions. It shows in a series of steps how the culture within an organization can be changed to practice DevOps successfully.

Many companies are trapped in a silo structure

The progressive division of labor has led to companies resembling bureaucracies today. The larger they are, the more bureaucratic their organization. Bill — the hero of the DevOps novel The Phoenix Project [1] — must also experience this when he is promoted to head of IT operations after his supervisor unexpectedly resigns. Although his employer, a long-established automotive supplier, is experiencing a severe crisis, the individual departments and their managers work more against each other than with each other.

For example, in the following scenario: An application goes into production after successful development and testing. Under real conditions, however, it turns out that performance is much lower than expected. “Just organize more and faster hardware,” resounds from the developer’s corner. “Your code is just poorly written,” it comes back from the store floor. Lack of communication and “political” considerations are a natural expression of the silo structure. Following the motto: the bigger the failure, the fewer the guilty parties (or at least those who admit responsibility).

Such disputes and difficulties in understanding are also repeated at the level of IT and non-IT departments. Decision-makers in business departments such as sales, marketing, or finance are quickly frustrated when IT turns out to be a bottleneck for their initiatives — or worse, a nuisance. In “The Phoenix Project”, the buck stops with IT, or more precisely, IT operations, until the redemptive turnaround finally takes place. Until all the players come together around the table and agree on a common roadmap, the company comes perilously close to the abyss.

In recent years, it has become quite fashionable in the corporate world — not least due to agility and DevOps — to question silo structures. This is often combined with the demand for an open error culture and flat hierarchies. However, we should always keep in mind that the perception of roles and social status as depicted in the silo structure is fundamentally human and intuitive. It can be observed that in the corporate world, too, the perception of social rankings as well as “tribal” affiliations is very old and very deeply rooted. For the most part, it takes place unconsciously.

DevOps — as will be shown elsewhere — is often described as counter-intuitive. This certainly also applies to our accustomed corporate and organizational culture, which is based on perceptions of status. On the one hand, status is stored vertically — namely in the form of a hierarchy and chain of command (“delegate” versus “implement”); as well as horizontally — namely in the perception of another part of the organization as an “alien tribe” (“we” versus “they”). DevOps means largely putting this schema on the back burner. Only then the added values will be realized in the process.

Fig. 1 — In the “silo structure”, the effects of decisions on other departments and hierarchy levels are insufficiently considered. The reason for this is the lack of information flow. Graphic: Felix Schönherr

Here we find an initial answer to the question of why a cultural change is needed in the company for the successful practice of DevOps. The silo structure is still born out of the requirement to specialize and differentiate functional areas; thinking in terms of process chains takes a back seat. However, every company is a process chain — from the visionary idea to the satisfied customer. During the technological development over the last two decades, IT has taken on the central function of the process chain. In this respect, the silo structure is reaching its limits, especially in this area.

Three steps lead to successful DevOps practice

An interesting aspect of the DevOps movement is that its principles do not come from IT. Rather, they have taken industrial manufacturing as their model, more specifically the “Kanban” and “Lean Management” approaches, which once originated on the factory floors of the Japanese automotive company Toyota.

Since this is assembly line work, the idea of the process chain is quite intuitively in the foreground: Department A hands over a sub-step to Department B, which in turn hands it over to Department C, and so on. If an error occurs at any part of the process chain, all the others are affected: If the error is at the beginning, the subsequent steps are no longer supplied. If it is at the end, the production also comes to a halt at the beginning because it can no longer transfer anything.

In lean management, we therefore talk about two types of customers: The “external customer” on the one hand, and the “internal customer” on the other. The latter term refers to the departments and stakeholders within the company that is affected by each other’s work results and impacts. For example, IT operations are affected by the results of software development because it is its job to implement the code in the existing hardware environments and get it running. So, development must treat operations as an internal customer and take its needs into account.

To make the transition, the authors of the DevOps Handbook [2] do not suggest a “hard” cut, but an approach to the new way of working via three steps. The three steps deepen collaboration and information flow successively. As follows:

The first step involves working out a piece of information and working channel between development and operations. The artifacts are to be handed over in smaller packages and shorter intervals and, at best, visualized. Kanban or Sprint Planning Boards are very well suited for this purpose. With such tools, all partial steps and results can be retrieved at glance. The first step establishes an information flow that still mainly runs from “left” to “right”, which means from development to operations. The aim is to establish a “single-piece flow” (or “one-piece flow”), the concept of which also originates from the Toyota Production System. Here, attention is paid to a small lot size for each sprint and each backlog item is passed through completely, instead of developing large programs first and then passing them on completely to operations. To ensure a fast workflow in the process, it helps to give developers (virtual) operating environments self-service and automated deployment.

The second step adds a channel to the information flow from “right” to “left”, which means from operations to development. This creates a loop of feedforward and feedback information. Its purpose is to make errors visible as quickly as possible to prevent a system error from occurring if possible. In the old days of the waterfall model, code was developed over several months or even longer until it entered the testing phase — that is, it could even be seen to cause problems in operation. DevOps have stepped in to shorten that cycle. The goal is always to trace the quality of work back to its source. To do this, feedback loops can be set up — for example, the individual developer making a pull request for each completed task and it is viewed by a team member — or automated testing in the appropriate operational environment. Docker also suggests ingesting metadata that can be used to visualize the flow of work. [3]

The third step goes beyond the first two in its objective: while these pursue the smooth flow of production, that one is about innovating and evolving the sub-steps. Since working on a complex system such as an IT environment always involves a certain degree of uncertainty, such a process is necessary. For example, even the best developer cannot completely predict how a new code section will work in operation. This is where the much-invoked “error culture” comes into play, as well as a “culture of innovation.” Development and operations must work together to figure out how a solution works best. Only when this understanding has arrived and is anchored in everyday practice can we speak of “DevOps” in the true sense. The authors of the DevOps Handbook logically refer to this as super-tribe, which leaves behind the “us versus them” thinking outlined above. In this context, the streaming provider Netflix now also speaks of the “Full Cycle Developer”, who masters all phases of the software lifecycle — true to the motto: “Operate what you build”. To provide the necessary assistance with infrastructures, Netflix provides developers with specialized and centralized teams. These provide the appropriate tools and environments for the new tasks. [4] And to transfer the principle of “Move Fast, Fail Fast (and Small)” into practice, the Californians rely on the “Canary” test. New code is only ever introduced on a small part of the infrastructure before it is rolled out completely if it is successful. [5]

Andon: The DevOps warning system

To ensure early error detection, the introduction of an “Andon” system is a good idea. This method, unsurprisingly, also comes from the Toyota production system. Using a warning light, workers can see when problems have occurred elsewhere on the production line and react accordingly. Likewise, they have a ripcord close to their workstation to indicate a malfunction. This ensures quick feedback when errors occur. The entire production line can concentrate on rectifying the fault. At the same time, parts are prevented from continuing to run and all the work is jammed at the failed location. Transferring the Andon system to the DevOps context is primarily a matter of communication; it achieves a similar effect as the analog version if implemented successfully.

Culture is a business asset

Culture is still seen — incorrectly — as a “soft” factor. Anyone who thinks of the term first and foremost in terms of the airy theses of self-proclaimed consulting gurus or useless courses of study should change their mind at the latest after dealing with DevOps. The preoccupation with the culture in the production halls probably played no small part in Toyota becoming the largest automobile manufacturer in the world. Highly successful IT companies like Alphabet and Amazon have also gained notoriety for cultural patterns they use to drive efficiency or innovation. For example, Google’s 20 percent rule that allows employees to allocate a portion of their work time to an innovation project, or Jeff Bezos’ meeting template in storytelling format [3].

To put it in management parlance, culture means scaling success methods across the organization by allowing employees to do their work in a certain way — without the need for ongoing directives or micromanagement intervention. Underlying the three steps outlined in the section above is a specific “culture” that is very different from the siloed structure mentioned at the beginning. To put it more generally: The culture is the “basis” for the methodological and organizational “superstructure.” So if you want to change the “superstructure” (i.e., tackle the DevOps transformation), you have to look at the “basis” as well.

One problem here is that culture is usually not consciously reflected at all in companies that work according to the silo structure. While the silo structure is visible externally, for example in organizational charts, the culture is expressed subliminally in behaviors and agendas. Typical of the silo culture is (unspoken) mindsets such as “our department needs the resources more urgently than others”; “we are not responsible for customer satisfaction”; “we always have to iron out the other departments’ mess”; “if the other departments can’t cope with our results, it’s their fault”; and so on. Thus, it is not surprising when political intrigue paralyzes a company and lowers its productivity.

So, a key part of the DevOps transformation is to trigger thinking and talking within the organization about how we work and why we do it that way. This sounds kind of esoteric at first, but the main purpose is to increase acceptance of new methods and processes, as well as to teach how to use them correctly. Cultural transformation is an essential part of this! Introducing new tools or even renaming roles, while everything remains the same, is not compatible with the DevOps idea.

The challenge of cultural transformation

As mentioned at the beginning, DevOps are counter-intuitive: less control by the upper hierarchy levels leads to more control through mutual feedback or “checks and balances”; fewer tasks through smaller work packages lead to more completed tasks through the more efficient process; less security through streamlining of policies and smaller patch cycles leads to more security through “security by code” and test automation.

The cultural change mentioned above is therefore anything but trivial — after all, it involves nothing less than a fundamental re-learning process for the workforce. So, what is the best way to approach this? The following steps can help (although this list does not claim to be exhaustive):

1. Create incentives to broaden one’s horizons — For DevOps to work, development and operations must have a solid understanding of what the “other” side does daily and the problems it faces. The same goes for the relationship between IT and business. To create a common understanding, it’s a good start for employees to look over the shoulders of colleagues from outside the business — at least for a short time. Internal job shadowing based on the rotation principle is a suitable basic exercise for this. Internal programs for further training can also be established, for example, lecture evenings in which each team presents its work to the others.

2. Align your actions with the process chain — In silo structures, the view is usually narrowed to what the immediate superiors want and what they like. In a DevOps culture, on the other hand, every team member should align their actions concerning the “internal” and “external” customers. This also applies to organizational units that are usually far away from the end customers. After all, their salaries are also paid by the end customers. It is also important to train empathy for the other organizational units and departments — i.e., the “internal” customers — and to keep their needs in mind. In line with the motto: A company is only successful as a functioning process chain.

3. Anchor error transparency and control — from a social perspective, the realization that I have made a mistake is strongly associated with feelings of shame and guilt. Usually, whoever has done something wrong “must” be disciplined or at least reprimanded. Admitting a mistake hurts and eats away at self-confidence. So, the temptation is to shift the responsibility to someone else. This can be done upward (“Management is to blame because it… …didn’t give us the right resources, …once again set its requirements too high, …never listens to us, …etc.!”); or downward in the form of a “pawn sacrifice.” This aspect of culture change is probably the most difficult because it targets deeply embedded patterns of perception. Among the most effective countermeasures here is an awareness of shared responsibility for the company’s success, as well as the realization that every mistake identified and reported early is a success.

4. Counteract overwork and stress — The “shift left” approach to DevOps brings more tasks and responsibilities, especially for the developer side. It is now contacted directly when something is not working in operation and must provide ad hoc solutions. Also, the much shorter update and release cycles often result in overtime as well as night and weekend shifts. This makes it more important to keep the stress level for employees as low as possible. Therefore, on the one hand, the distribution of work tasks should be structured and arranged. Routine tasks should be automated as far as possible so that employees have more time for their core tasks. Secondly, especially in an environment with high productivity and speed such as DevOps, it is important to reduce unplanned work through the right processes. After all, it is this that usually leads to stress and overwork. It is helpful, for example, to separate deployment and release from each other and to introduce a feedback round before the release. If an error occurs during deployment, there is still enough time to fix it before it is visible to the user. Similarly, early testing and validation phases can prevent the creation of technical debts that later have to be reduced again by unplanned additional work.

Sources

[1] Gene Kim, Kevin Behr, and George Spafford: The Phoenix Project. A Novel About IT, DevOps and Helping Your Business Win, 2018 (first edition 2014).

[2] Gene Kim, Jez Humble, Patrick Debois, and John Willis: The DevOps Handbook, 2016.

[3] John Willis, Docker and the Three Ways of DevOps Part 2: The Second Way — Amplify Feedback Loops, 2015 https://www.docker.com/blog/docker-three-ways-devops-2/ (last accessed 12/07/2020)

[4] Netflix Technology Blog, Full Cycle Developers at Netflix — Operate What You Build, 2018. https://netflixtechblog.com/full-cycle-developers-at-netflix-a08c31f83249 (last accessed 07/12/2020)

[5] Netflix Technology Blog, Deploying the Netflix API, 2013. https://netflixtechblog.com/deploying-the-netflix-api-79b6176cc3f0 (last accessed 07/12/2020)

[6] In one of his annual letters to Amazon management, Jeff Bezos prohibited the use of PowerPoint. Instead, meeting participants should create so-called “narratives” that outline an idea on four to six pages and are delivered in the form of a story. This makes meetings more productive and leads to faster and more binding decisions. See for example https://www.focus.de/wissen/experten/narrative-statt-folien-was-wir-von-jeff-bezos-powerpoint-verbot-lernen-koennen_id_8899586.html (last accessed on 12.07.2020)