In today’s world, acronyms are everywhere! A three-letter acronym (TLA) can mean different things: OSS could mean Open Source Software or, in a telco context, Operational Support System, Open Sound System (Unix), or something else. And as you know, there are plenty of other examples out there. So, you understand why it’s important to set the context in your organization too. Here’s an example from a previous employer of mine: PS stood for professional services, as well as pre-sales. You can imagine how many unnecessarily confusing situations that caused. So, my recommendation is to kill not Bill, but ambiguity. This is worth it. The practices we will introduce in Chapter 5, Building Your Distributed Technology Operating Model, will help you achieve this.
In this first chapter, we will provide some context and a description of what we mean by the terms and terminology we use. We are slaying buzzwords right here, right now. Let’s get to it.
Strategy
A strategy is an integrated set of choices an organization makes, without really knowing if they work. A strategy is a set of hypotheses that you think will help you win on a playing field of your choosing. So, a strategy is based on a theory. That theory should be coherent and executable by the people in your organization right now. What winning looks like is defined by the strategic goals you define. Ideally, a strategy is communicated to your colleagues so that, as a team, you can all pull in the same direction. As an example, regarding the playing field of your choosing the Amazon bookstore decided to extend their playing field from an online bookstore. First, it was to become “The Everything Store” and then a public cloud service provider.
To clarify how to develop and set an operating model in the context of our strategy and goals, we can utilize an existing framework from the Business Motivation Model (BMM). The OMG Group’s BMM includes the Means to End framework. Means is the action plan, while End is the desired result or aspiration. You can study the BMM meta model via the link provided in the Further reading section and learn about the entities and relationships in more detail, but it’s not necessary to do so for this book.
The Means to End framework aims to put concepts such as Mission, Vision, Strategies, Goals, Objectives, and Tactics into context and defines a common language. A common language is a very powerful enabler. The information exchange and hence learning and understanding that occurs across teams, even from within the same organization because of that common language, is phenomenal.
I’ve run many workshops where this simple framework created a lot of clarity for the customer’s team, which is why I recommend it. I also added a link in the Further reading section in case you want to facilitate a workshop yourself.
Introducing the Means to End framework (see Figure 1.1) to workshop participants is an effective way to link their vision (for example, we want to be a digital bank with a brick-and-mortar experience) and mission (we prioritize building out our digital CX), as well as their strategy to goals, and distinguish between strategic goals (for example, grow assets under management beyond $80 billion for a bank) and associated tactical objectives (for example, automate 100% of the loan origination process). At this point, a valid tactic could be to fund a project that digitizes the enter loan origination process and the associated strategy to build out straight-through processing for all asset-related customer touchpoints:
Figure 1.1 – The Means to End framework
So, even though we ultimately talk about strategy, let’s take a quick detour to see how strategy is connected to the other elements you encounter in your organization. This will be useful later when we define the “success criteria” – that is, our goals and objectives – as we move toward our target state operating model. Let’s quickly go through the different elements of the framework and give some examples of what we mean by that:
- A vision represents an organization’s future and is answering the question of who we are going to be in 2, 3, or 5 years from now.
- The mission is the means to achieve the vision (end) and sets the direction by stating what organizations do daily to achieve our vision.
- Goals are connected to the vision because the goals that have been set need to align with your organization’s vision. Because we are in the “strategic” layer, goals are strategic and hence answer the question, “What strategic goals do we need to hit to make this vision a reality?” Goals are longer-term but should be narrow enough and have qualitative definitions so that objectives can be created for them.
- Strategies are the means we choose to achieve our strategic goals (end). In this layer, we are figuring out what high-level approaches (programs of work, products, or services) and hypotheses are being funded to achieve our goals. Strategies are usually broad in scope and long-term compared to tactics. Think of a program and product instead of a project. As you can see, the mission informs the strategies.
- Objectives are steps toward a goal. They should be specific and of a qualitative nature with an end date to ascertain whether the goal has been reached or not. Objectives need to be linked to strategic goals; otherwise, you need to ask yourself: Why am I doing this?
- Finally, we have tactics. What tactical projects or tasks (means) do we employ to achieve our objectives (end)? Tactics are short-term and narrower in scope – think project or feature rather than program or product. Strategies inform the tactics, and the tactical objectives should support the strategic goals. To summarize, every objective you achieve brings you closer to reaching the associated strategic goal.
But I need to utter a word of warning: you can run into difficulties distinguishing between the strategic and tactical layers at times during workshops. A pro tip is to keep in mind that strategies and goals are usually longer-term and broader in scope. Tactics, on the other hand, are shorter-term and narrower in scope. The following diagram shows how you can outline the context of your strategy on a single page:
Figure 1.2 – Visualizing the big picture – a single-page overview of strategic context
And how is this related to the cloud operating model I came here for, you might ask? Great question! A business operating model needs to support the company’s strategy. For example, if you want to grow revenue and venture into customer segments by bringing new features or products faster to market while utilizing Agile and microservices but your IT operating model is set up to stabilize your systems of records, you might end up wasting lots of effort and money. So, the operating model needs to align if you want to be efficient and effective.
And the same is true for your cloud operating model and strategy. If you want to reduce your time to market, attract and retain talent, reduce OpEx, be more innovative, reduce tech debt, or improve your bottom or top line, then you need to do more than just select a hyperscaler to run on.
In short, your operating model needs to encompass things such as funding (project or product?), team setup (Conway’s law or Dunbar’s number?), platform (where and what to abstract?), cultural practices (Open Practice Library and/or DevSecOps?), and much more. This is the core of your cloud operating model. We will cover this in more detail in Chapter 5.
Capability
Capability is, in general, defined as the power or ability to do something. The Open Group Architecture Framework (TOGAF) defines it as an ability that an organization, person, or system possesses. Product features can sometimes be referred to as capabilities. However, product or system features are not what we mean in this book when we say capability. We mean the organizational capability that enterprise architects refer to when people, processes, and technology are in place and form the ability of an organization to do something. Such a capability could be product marketing, processing an insurance claim, or employing DevSecOps practices. If an enterprise has any of those capabilities, then the people, processes, and technologies are in place. And that’s what we mean in this book when we refer to “capability.”
About culture and why we are recommending open practices
Sociologist Dr Ron Westrum has defined different cultural typologies within organizations and his research has shown how culture affects performance. The different typologies and the “cultural features” or behaviors you can observe are depicted in the following table, but to summarize, there are three culture types:
- A pathological (power-oriented) culture is characterized by large amounts of fear and threat. People often hoard information or withhold it for political reasons or distort it to make themselves look better.
- In bureaucratic (rule-oriented) cultures, organizations protect departments. People in the department want to maintain their “turf,” insist on their own rules, and generally do things by the book – their book.
- Generative (performance-oriented) cultures help organizations focus on their missions and goals. A generative culture allows people to openly ask, “How do we accomplish our goal?” Everything is subordinated to good performance and doing what needs to be done to get things done.
These are all specific features of Westrum’s cultural categorization. Overall, the mission and goals take precedence in a generative or performance-oriented culture. And that’s certainly what you want if you want a far-reaching concept such as an operating model to come alive.
If we hone in on specific features within that cultural typology, Westrum found that a performance-oriented culture displays behaviors such as high cooperation, with novelty being implemented/welcomed and information is freely available. And that, beautiful people, is exactly what open source is all about. Here is a table outlining the behaviors that can typically be observed in each of the cultural types:
Figure 1.3 – Westrum’s cultural typology and associated behaviors observed
Git repositories contain freely available information where anyone is welcome as a contributor, new ideas are discussed, and new features and bug fixes are implemented and merged via pull requests. So, how do we transfer this code-centric approach to workshops, status meetings, brainstorming and discovery sessions, strategy discussions, and finally into building a cloud operating model?
The answer is open practices.
Westrum also observed that organizational culture affects how information moves within an organization. Westrum provides three characteristics of good information:
- It provides answers to the questions that the receiver needs answering
- It is timely
- It is presented in such a way that it can be effectively used by the receiver
Open practices enable and create “good information” through globally proven practices that are freely available in an open source manner to everyone who needs to facilitate workshops, drive consensus or innovation, discussions, or brainstorm sessions through the Open Practice Library. The Open Practice Library uses a modified Mobius Loop to sort practices into four categories: Foundation, Discovery, Delivery, and Options. A Mobius Loop is a horizontal figure of 8 representing an infinity loop sitting on top of Foundation practices that iterates through Discovery, Delivery, and Options indefinitely.
For each part of creating the operating model, I will recommend specific exercises that you can run with your team and stakeholders. This will help you establish an open culture, utilize the wisdom of the crowd (as opposed to following a detrimental Highest Paid Person’s Opinion (HiPPO) approach, and create a sense of ownership in your organization.
And that’s because people – especially the people who do the work – have a say instead of just being a recipient of decisions – in our case, decisions around a (new) cloud operating model. Being a passive recipient is – in my experience – more likely to create change resistance than excitement and ownership. And a sense of ownership is what we need if we want to create and, subsequently, iteratively improve our cloud operating model.
Operating model
First of all, an operating model and an operations model are two different things. When we talk to organizations, we often start with this statement as it clears up potential misunderstandings right from the start. They are related, but different. There is no single place to go look up the definition of what a best-practice operating model is. And worse, each management consultancy has different definitions, usually as a competitive differentiator. Let’s have a quick look at the definition of an operating model.
An operating model is a plan or system that outlines how an organization will function and achieve its goals while delivering value for its customers. It defines the processes, resources, structures, management approaches, and systems that are needed to support the operations of the business. It also outlines the roles and responsibilities of different teams and individuals within the organization, as well as the relationships between them. An operating model is an important tool for ensuring that an organization can operate efficiently and effectively, and it is used to support strategy and decision-making.
An operating model is divided into different categories or dimensions in a divide-and-conquer kind of fashion. This helps us focus on specific aspects within the vast spectrum of things to consider while running an organization.
Even though our ultimate focus in this book is on distributed technology operating models in the context of hybrid multi-cloud and the edge, it’s good to start by looking at more general business operating models first and how different subject matter experts define them. The learnings and observations you gain will make it easier for you to work on a cloud operating model and its associated dimensions.
Operating model dimensions
The dimensions of an operating model help you categorize specific topics so that you can focus on and hone in on them. As I said previously, there is no set of globally agreed-upon dimensions. Let’s take a look at what some expert management consultancies suggest to help you categorize the problem space.
Accenture
Accenture, in their proposal of a “resilient operating model,” uses the following categories:
- Agile governance organizes the workforce and transitions the way work is done, promoting a culture of experimentation and innovation, faster decision-making, and approval cycles regarding how performance is measured.
- Taking a two-pronged technology approach is about investing in new and sweating existing technology assets on an incremental and ongoing basis to continuously evolve their capabilities.
- Configure and reconfigure talks about creating squads, pods, or cells that operate like discrete businesses within the organization to help boost agility and responsiveness for specific products and services.
- Invigorating the ecosystem includes reevaluating partners, channels, and services and developing capabilities to help achieve long-term objectives such as driving innovation, developing new products and services faster, entering new markets, or being more agile.
- Decision-making at the edges is about combining a real-time data access capability with the cultural shift to empower employees to make decisions based on what they see in the data.
- Reskill, reskill, reskill suggests what most of us technologists know already: we are never done learning. It’s about building a culture of continuous adaptation and building and rebuilding the skills of employees, including the skills required to work with the latest technologies to create a human + machine mindset.
KPMG
Now, let’s look at what KPMG (one of the Big Four accounting organizations) has to say. KPMG calls their operating model the target operating model (TOM), which is a concept we will also look at for our technology operating model later in this book. Here, we map out what the target state is in certain areas – for example, product-based budgeting – and then assess where we are to find out how to get to the target. We’ll cover this in more detail later in this book.
So, let’s go back to our friends at KPMG. The KPMG TOM dimensions are as follows:
- Functional process
- People – who does what, reporting lines, required skills, roles, and responsibilities
- A service delivery model, such as a shared service center, Center of Excellence (CoE), outsourcing, and service delivery optimization
- Technology relates to applications and integrations that enable processes with cloud architects, integrations, conversions, and test scripts
- Performance insights and data talks about the what and how of reporting, associated information requirement, and KPI frameworks to optimize decision making
- Governance to cater for oversight and define risks and controls, segregation of duty, and access rules and policies
The context of the KPMG TOM is enterprise transformation and has a strong process focus. The KPMG TOM also comes with blueprints, including process maps. In one of my last enterprise transformation programs, we had several larger management consultancies involved but for process mapping, we decided against the use of any proprietary process maps. We went with APQC and their industry frameworks instead as a baseline.
Forrester
So, what’s Forrester saying? The latest edition is about an operating model centered around “customer obsession.” So, the customer operating model talks about the following dimensions:
- Strategy
- Vision
- Culture
- Performance
- Corporate values
- Motivation
Then, it dives into sub-dimensions such as accountability and compliance. Only a few layers down, we get to more tangible topics such as operating units, location, reporting lines, infrastructure, applications, people, data, and processes, and finally to the customer journey, customer experience, product and service offerings, and value propositions – perhaps too many things to be practicable. But it’s not easy to consolidate so many important aspects into the right amount and the right dimensions, as we will see later. The Forrester context is the IT operating model but with a focus on customers. We dig customer focus. A lot.
McKinsey
Lastly, before we wrap things up, let’s have a look at McKinsey, which has three high-level dimensions called People, Processes, and Structure, and then lower-level sub-dimensions all centered around Strategy:
- People:
- Informal networks
- Culture
- Talent and skills
- Workforce planning
- Structure:
- Roles and responsibilities
- “Boxes” and “lines”
- Boundaries and location
- Governance
- Processes:
- Process design and decisions
- Performance management
- Systems and technology
- Linkages
In summary, regardless of what different names different experts use, it’s safe to say that for an organization to deliver value to its customers, the following operating model dimensions must exist under one name or another:
- Process
- Organization
- Location
- Information
- Supplier
- Management systems
All these operating model dimensions should contribute to an organization’s value chain. If this sounds a bit like a business model now, then have a look at Figure 1.4:
Figure 1.4 – Business versus operating model
An operating model and a business model are two related but distinct concepts:
- A business model is a high-level framework that describes how an organization creates value for its customers. It defines the value proposition, target market, revenue model, and cost structure, and outlines how these elements work together to create a sustainable and profitable business.
- An operating model is a more detailed framework that outlines how organizations execute their business model. It defines the organization’s strategy, structure, processes, people, technology, and governance, and outlines how these elements work together to deliver value to customers and stakeholders.
In other words, while a business model describes what an organization does and how it generates revenue, an operating model describes how it manages resources, processes, and activities.
Our little excursion into the world of operating models also showed that it can be quite frustrating if you are looking for the one and only operating model, simply because it doesn’t exist. And that is true for general business and organizational operating models as much as it is for cloud operating models.
If we look at it differently, it’s quite liberating and reassuring that we have the freedom to create and employ the best-fit operating model for ourselves. And the best thing is that we can involve our peers, teams, managers, and direct and indirect reports to ensure we create something that is a) fit for purpose and b) that people feel a sense of ownership with. And if people have a sense of ownership, we have a chance of getting our distributed cloud operating model adopted.
And just to finish up, in case you are interested in what a meta-model looks like and to be a bit more scientific, the operating model can extend the business motivation meta-model to show and describe the relationship between the different entities that play a role in defining an organizational and, ultimately, hybrid cloud operating model:
Figure 1.5 – Business motivation model with our operating model extension
As you can see, the operating model entity is connected to the strategy. This is because we said it needs to support the strategy – remember the cloud-native microservices versus mainframe example from earlier? The operating model entity is also related to organizational capability. This is because if the organization doesn’t have the required capabilities – that is, people, processes, and technology – then it can’t effectively and efficiently operate and execute projects or larger programs of work. This, in turn, means that organizations that need a DevSecOps capability to reach their distributed cloud operating model have to build it.
Operating models for hybrid cloud and the edge
Earlier, we looked at business operating models. We could now go on and do the same exercise as in the previous section and see what global system integrators and consultancy companies have in terms of cloud operating models so that we can inform our distributed technology operating model (including the edge), but we will find the same thing is true here.
And while hybrid or multi-cloud is a reality and a necessity for most organizations, we want to go further. We want to incorporate edge computing. Edge computing is becoming pervasive across all industries. There are obvious use cases in this category, such as self-driving cars or mobile phone towers as part of the telco provider’s edge radio access network (RAN), as well as operational edge scenarios for monitoring manufacturing plants.
A hybrid cloud and edge operating model is a type of hybrid cloud operating model. It involves the use of both public cloud services and a private cloud or on-premises infrastructure, as well as edge computing locations.
As part of a hybrid cloud and edge operating model, an organization can choose to run certain workloads on the public cloud, others on the private cloud or on-premises infrastructure, and still others on edge computing resources, because the operating model dimensions cater to this scenario.
A hybrid cloud and edge operating model allows organizations to adjust their computing and data deployments to the specific needs of their users and customers. This is because the relevant distributed security and compliance posture management, development and operational capabilities, architecture, funding, and skills are available to make use of the most appropriate resources for each workload to achieve the desired speed, flexibility, scalability, and cost-effectiveness.
Hybrid cloud computing services provide on-demand access to computing resources such as storage, networking, and processing power over a WAN. This allows the organization to request resources as needed, pay for only the resources it uses, and benefit from the flexibility, agility, and cost-efficiency of the elasticity (scale-out/scale-in) of the cloud. However, this requires suitable workloads and a technology operating model that is ready to leverage this opportunity.
Distributed (as opposed to centralized and single cloud) is key here. In the context of hybrid cloud and edge computing, “distributed” refers to deploying computing resources across multiple locations, including both on-premises data centers, far or near edge locations, and public cloud-based environments.
The ground zero of a hybrid cloud and edge operating model includes the following elements:
- Distributed architecture: The design and layout of an organization’s distributed cloud and edge-based systems and services, including how they are connected and interact with one another
- Distributed infrastructure: The hardware, software, and networking resources that an organization uses to host its applications and data in a distributed manner, leveraging the benefits of cloud and edge computing and other distributed technologies
- Distributed security: The measures and controls that an organization puts in place to protect its distributed cloud and edge assets from threats and vulnerabilities
- Distributed management: Control plane processes and tools that an organization uses to monitor, optimize, and maintain its distributed cloud-based systems and services
Proactively creating a fit-for-purpose cloud and edge operating model enables organizations to ready themselves for the distributed future. Doing so cultivates the potential to outdo the competition. Outdoing the competition can have many forms: being more agile, quicker to market, more cost-effective, working with less risk, more secure, and so on.
We can expand on the infrastructure layer and then the operating model for the distributed future using a set of practices, processes, and technologies to operate and manage its applications, data, security, and compliance posture on top of distributed infrastructure. A “distributed infrastructure” simply means a mix of on-premises, private, and public cloud, as well as edge locations.
We believe – and research suggests – that the future will be highly distributed and that organizations are likely to rely more heavily on combinations of cloud computing, edge computing, and other distributed approaches to deliver their products and services related to customer experience. This may involve leveraging the scalability, reliability, and cost-efficiency of a private and public cloud, as well as the low latency aspects of edge computing.
So, why do we need an operating model? Because an operating model allows us to look at different aspects of our execution, which enables strategy and business model alignment in a reusable manner with a focus on “moving target” outcomes. We shall call those aspects dimensions henceforth.
While organizational operating models balance integration versus standardization or localization against globalization or centralization versus decentralization, the technology operating model balances the same tensions by providing innovative freedom versus operational excellence. More details on this will be provided in Chapter 5, and Chapter 6.
Before we dive into the details of how you can craft your own fit-for-purpose distributed technology operating model, perhaps you should start thinking about what aspects or dimensions you would choose if it was all up to you.
Because we are still in the “foundations” part of this book, next, we will define a few more terms to provide clarity for later sections of this book.
Engineering and operations
Just a word of warning: it can get blurry!
Software product engineering and operations (including support and maintenance) are related but distinct activities within the field of software development. In the early days, at least. Then, Dev{X}Ops came along, and it got harder to talk to one another about this topic.
For the sake of this book, we’ll define three things:
- A blurry universal truth about engineering and operations
- Platform engineering (where the buzzword SRE lives)
- Product engineering (where DevOps is at home)
Platform and product engineering both contain operations aspects in their respective fields:
- Platform engineering owns platform operations, support, and maintenance
- Product engineering owns product operations, support, and maintenance
This clarification is important because it helps clarify practices such as DevOps, DevSecOps, and Site or Service Reliability Engineering (SRE), as examples. In modern organizations, product and platform teams have a product manager, backlog, and practices. So, for example, the platform team could employ an SRE approach, work with their internal customers (the product teams) on defining service-level agreements (SLAs) and service-level objectives (SLOs), and collaborate while selecting the most appropriate service-level indicators (SLIs), which indicate potential customer experience (CX) impact and hence help raise alarms before CX is impacted.
The dependent product teams use this platform as a service to build their products and service offerings on top and might use different tooling or different approaches such as DevSecOps. Some teams might use a GitOps approach, while others might not.
And this is where the operating model comes in. In the platform and product team example we provided, there were questions about team structure, team collaboration modes, funding, location, and practices. These are all questions that the operating model provides guidance with.
In general, software product engineering focuses on designing and developing new software products, while support, maintenance, and operations focus on maintaining and supporting existing products. However, there is potentially an overlap between the two activities, particularly in organizations where software product engineers are also involved in maintaining and supporting the products they develop (DevOps). We just saw an example of this. And that’s why I called it blurry.
The takeaway is that while there is a “best practice” definition, it’s always recommended to clarify the actual implementation of engineering practices involved in delivering products and services to production.
Platforms
Despite sometimes not being liked because they introduce an additional “abstraction layer,” platforms are generally popular because they address different aforementioned aspects:
- Allow sovereign use and implementation of infrastructure, software, services, and data
- Allow for consistent and SLA-confirmative CX design
- Allow for the required industry-specific compliance and security stance
- Shift security and compliance left without additional burden for developers and product teams
- Allow for a heightened level of reuse and sharing of tools, processes, and digital assets, such as patterns
- Allow the cognitive load of the aforementioned product development teams to be minimized
Having a dedicated platform team to focus on the increasing productivity of product teams, especially in a distributed technology context, is something organizations need to consider. If not an organically grown mess, technical debt and undifferentiated heavy lifting through different tools, services, products, processes, and skills are almost certainly guaranteed, especially in the context of hybrid multi-cloud and the edge.
A platform is a foundation for self-service and a consistent CX. It makes skills, documentation, and processes reusable across both the platform team and the product development teams. The trick is to find the right abstraction layer for compute, network, storage, and databases.
Guiding principles and guardrails
Guiding principles and guardrails are both used to provide guidance and structure for decision-making and behavior. However, they serve slightly different purposes.
Guiding principles are softer and found as statements that describe the values and beliefs that guide decision-making and actions within an organization. They help establish a shared understanding of the organization’s goals and priorities and provide a framework for making decisions that align with those goals. For example, a guiding principle might be “employee safety first,” or “design for workload portability.”
Guardrails, on the other hand, are specific and hard guidelines or constraints that are put in place to help ensure that actions and decisions are aligned with the guiding principles. They serve as boundaries that help prevent actions that could lead to negative outcomes or that conflict with the organization’s goals. For example, a guardrail might be a policy that requires all new container-based applications to be built from a certified-based container base image, and require security scanning and a signature before being deployed.
In essence, guiding principles are the overarching values and beliefs that guide an organization, while guardrails are the specific rules and policies that help ensure that those values and beliefs are put into action. Together, they provide a framework for decision-making and action that aligns with the organization’s goals and priorities.
Being antifragile to change
Robust is out. Antifragile is in. Resilience is a way better mindset than robustness. You don’t know what type of storm is coming next, so there’s no need to put all you got into an earthquake-safe nuclear power plant if the next thing hitting you is a tsunami. You can disagree, but that’s what we believe in. Embracing change is so fundamental that it permeates all operating model dimensions.
In terms of antifragility, there is nothing more antifragile than open source software. Even COVID couldn’t stop open source software development based on the distributed nature and cultural aspects of information sharing and high collaboration. Antifragile is more than resilient. Antifragile “things” get better and stronger when exposed to stressors and change, based on the old saying “What doesn’t kill you makes you stronger.” And that’s what the open source community has already proven for many decades. If new challenges emerge, then new projects emerge, and after a while of trying out the best ideas and concepts, these new projects and people converge into only a few and help combine efforts to make those solutions enterprise-ready. And that’s antifragile, just like Nassim Nicholas Taleb likes it.
Metrics
I am sure there are really good metrics and really dumb metrics. And even if those metrics appear dumb or smart at the outset, we don’t know unless we know how and what they are used for and can see if the metrics are driving the right behaviors. Comparing different teams by comparing user story points is wrong management behavior. As a general rule, though, metrics should always be balanced; otherwise, the system can be rigged. I guess we all had those IT service hotlines at some stage in our professional lives that closed the tickets before the issue got resolved. That is driven by metrics that look only at ticket open times instead of balancing things out with issues resolved. The DevOps Research and Assessment (DORA) metrics show this nicely: they balance speed with stability. Speed is measured by deployment frequency and lead time to change, while stability is measured by change failure rate and Mean Time to Repair (MTTR).
On teaming
We believe in setting up long-lived teams. Ample research has been conducted. A short but great read is the book Team Topologies if you want to dive deeper into this topic. For example, if you look into Dunbar’s number, then you will find that there is a maximum team size. Above that maximum, smaller sub-groupings will appear because we humans can’t develop trust among a large number of teammates.
Secondly, each team has been shown to go through different phases: Forming, Storming, Norming, and Performing. Knowing the last two phases alone shows that you should not allow short-lived feature teams to be the norm in your organization.
Then, on an individual basis, you can study the results around the intrinsic motivational factors of the knowledge worker, such as autonomy, mastery, and purpose. In an ideal world, your operating model will balance all this out.
On architecture
Conway’s law suggests that systems are built according to the existing organizational structure. So, it seems to be a good idea to define an architecture first and then set up a team structure. In nearly all organizations we know, it’s done the other way around.
As you can see, there’s a lot to do.
So, let’s get into it.