The cloud native journey

Companies large and small, new and mature, are seeing the benefits of cloud computing. There are many paths to get to the cloud, and that often depends on the maturity of the organization and the willingness of senior management to enact the change required. Regardless of the type of organization, the shift to cloud computing is a journey that will take time, effort, and persistence to be successful. It is easy for a company to say they want to be cloud native; however, for most companies, getting there is a complex and difficult prospect. For organizations that are mature and have lots of legacy workloads and manage data centers, they will have to not only identify a roadmap and plan for migration, but also manage the people and process aspect of the journey. For companies that are newer and don't have a lot of technical debt in the form of traditional workloads, their journey will accelerate with the cloud being the place of early experimentation; however, maturing to a cloud native enterprise will still take time.

The decision to be cloud-first

Cloud computing is here to stay. Years ago there were many discussions on whether a company should declare a cloud-first model, or not chase the latest and greatest technologies; however, at this point in time, just about every company has taken the first step towards cloud computing, and many have made the decision to be a cloud-first organization. At its most basic level, making this decision simply means that all new workloads will be deployed to the chosen cloud vendor unless it is proven that this will not be sufficient for the business requirements. Sometimes this happens due to information security (that is, government-classified or regulatory conditions), and sometimes it's because of a specific technical issue or limitation that the cloud vendor has, which is difficult to overcome in a short time. Regardless, the vast majority of new projects will end up in the cloud with various stages of maturity, as described in the CNMM earlier.

Even though this decision is common in today's IT environment, there are still challenges that need to be addressed for it to be successful. IT and business leaders need to ensure that their people and processes are aligned to a cloud-first model. In addition, developing a DevOps and agile methodology will help the organization overcome the slow and rigid nature of waterfall projects with siloed development and operations teams.

People and process changes in the cloud

Organizations with large IT departments or long-term outsourced contracts will inherently have a workforce that is skilled at the technologies that have been running in the company up until that point. Making the shift to any new technology, especially cloud computing, will require a significant amount of retooling, personnel shifting, and a change in the thought pattern of the workforce. Organizations can overcome these people challenges by splitting their IT workforce into two distinct sections: those who maintain the legacy workloads and keep the original methodologies, and those who work on the cloud-first model and adopt the new technologies and process to be successful. This approach can work for a while; however, over time, and as workloads are moved to the target cloud platform, more and more people will shift to the new operating model. The benefits of this approach allow a select few people who are passionate and excited to learn new technologies and techniques to be the trail-blazers, while the rest of the workforce can retool their skills at a more methodical pace.

One specific area that can often be difficult for experienced IT professionals to overcome, especially if they have gained their experience with data center deployments and lots of large legacy workloads, is the concept of unlimited resources. Since most cloud vendors have effectively unlimited resources to be consumed, removing that constraint on application design will open up a lot of unique and innovative ways to solve problems that were impossible when designing applications before. For example, being bound to a specific set of CPU processors to complete a batch job will cause developers to design less parallelization, whereas with unlimited CPUs, the entire job could be designed to be run in parallel, potentially faster and cheaper than with lots of serial executions. Those people who can think big and remove constraints should be considered for the trail-blazers team.

Processes are also a big hurdle for being a cloud-first organization. Lots of companies that are transitioning to the cloud phase are also in transitioning from the SOA to microservices phase. Therefore, it would be common for the processes in place to be supportive of SOA architectures and deployments, which are most likely there to slow things down and ensure that the big bang deployments to the composite application are done correctly and with significant testing. Being cloud-first and using microservices, the goal is to deploy as fast as possible and as many times as possible, to support quickly changing business requirements. Therefore, modifying processes to support this agility is critical. For example, if an organization is strictly following ITIL, they might require a strict approval chain with checks and balances before any modification or code deployment can be made to production. This process is probably in place because of the complex interconnected nature of the composite applications, and one minor change could impact the entire set of systems; however, with microservices architectures, since they are fully self-contained and only publish an API (usually), as long as the API is not changing, the code itself would not impact other services. Changing processes to allow for lots of smaller deployments or rollbacks will ensure speed and business agility.

Agile and DevOps

The cloud is not a magic place where problems go away. It is a place where some of the traditional challenges go away; however, new challenges will come up. Legacy enterprise companies have been making the switch from waterfall project management to agile for a while now. That is good news for a company intending to be cloud native, since iteration, failing fast, and innovation are critical to long-term success, and agile projects allow for that type of project delivery. A large part of the reason this methodology is popular with cloud native companies is the fast pace of innovation that cloud vendors are going through. For example, AWS launched 1,430 new services and features in 2017, which is almost four per day, and it is set to eclipse that again in 2018. With this level of innovation happening, cloud services are changing, and using an agile methodology to manage cloud native projects will enable companies to take advantage of these as they come out.

DevOps (or the merging of development teams and operations teams) is a new IT operating model that helps bridge the gap between how code is developed and how it is operated once deployed to production. Making a single team accountable for the code logic, testing, deployment artifacts, and the operations of the system will ensure that nothing is lost in the life cycle of a code development process. This model fits well with the cloud and microservices, since it enables a small team to own the full service, write in whatever code they are most suited to, deploy to the cloud platform chosen by the company, and then operate that application and be in the best position to resolve any issues the application might have once it's in production.

Together, agile methodologies and DevOps are a critical change needed by companies that are considering the move to becoming a cloud native organization.

Cloud operating environment

The journey to the cloud will take time and lots of trial and error. Typically, a company will have identified a primary cloud vendor for their requirements and, in some cases, they will have a second cloud vendor for specific requirements. In addition, almost all companies begin with a hybrid architecture approach, which allows them to leverage their existing investments and applications while moving workloads into their chosen cloud. Often, the cloud native journey begins with a single workload being either migrated or designed for the cloud, which gives critical experience to the design team and helps create the operating foundation the organization will use for the cloud.

Cloud operating foundation

The cloud is a vast set of resources that can be used to solve all kinds of business problems; however, it is also a complex set of technologies that requires not only skillful people to use it, but also a strict operating foundation to ensure it is being done securely and with cost and scale in mind. Even before a single workload is deployed to the cloud, it is important for a company to fully identify their expected foundational design. This would include everything from account structures, virtual network design, regional/geographic requirements, security structure in terms of areas such as identity and access management and compliance, or governance considerations with regards to specific services to be used for different types of workloads. Understanding how to leverage Infrastructure as Code, as pointed out in the axis automation earlier, is also a critical element that should be identified early.

Once all of the decisions are made and the cloud operating foundation is in place, that is the time for the initial projects to begin. Between the decision-making process and the first few projects being deployed, the DevOps teams will gain lots of experience with both the agile pace of working, the target cloud vendor platform, and the company's set of guidelines and approaches to their cloud native environment.

Hybrid cloud

In addition to the foundation of the cloud platform, a company must decide how to leverage its existing assets. While the value proposition of cloud computing is not debated much anymore, the pace of migration and how fast to deprecate existing assets is. Using a hybrid cloud approach for the beginning of the cloud native journey is very common, and lets the company easily operate with its two existing groups (the legacy group and the cloud-first group). This approach will also enable a cheaper pathway to success, since it doesn't require a 'big bang' migration from existing data centers to the cloud, but allows for individual projects, business units, or other segregated areas to move faster than others.

All cloud vendors have a hybrid architecture option that can be leveraged when a company wants to keep some workloads in their data centers and have others in the cloud. This hybrid architecture approach typically involves setting up some type of network connectivity from one or more data center(s) to one or more cloud vendor geographical region(s). This network connectivity can take place in the form of a VPN over public internet paths, or various dedicated fiber options. Regardless of the type of connectivity, the outcome should be a single network that makes all company resources and workloads visible to each other (within security and governance constraints). Typical patterns for a hybrid cloud architecture are:

Legacy workloads on-premises and new projects in the cloud
Production workloads on-premises and non-production in the cloud
Disaster recovery environment in the cloud
Storage or archival in the cloud
On-premises workloads bursting into the cloud for additional capacity

Over time, as more workloads are migrated into the cloud or retired from the on-premises environments, the center of gravity will shift to the cloud and an organization will have more resources in the cloud than on-premises. This progress is natural, and will signify the tipping point of a company that is well into its cloud native journey. Eventually, a cloud native company would expect to have all of its workloads in the cloud and remove just about all hybrid connectivity options since they are no longer in use. At that point, the organization would be considered a mature cloud native company.

Multi-cloud

Enterprise companies need to ensure their risk is spread out so that they reduce the blast radius in the event of an issue, whether this be a natural disaster, security event, or just making sure that they are covering their customers in all of the locations they operate in. Therefore, the allure of a multi-cloud environment is strong, and some larger organizations are starting to go down this path for their cloud journey. In the right circumstances, this approach does make sense and gives the additional assurance that their business can withstand specific types of challenges; however, for most companies this type of architecture is going to add significant complexity and possibly slow down adoption of the cloud.

The myth of multi-cloud deployments and architectures is often spread around by system integrators that thrive on complexity and change management. They want to promote the most complex and design-heavy architecture possible, so that a company feels compelled to leverage them more to ensure their IT operations are running smoothly. Multi-cloud is the most recent way of going about this, since taking this route will require twice the amount of cloud-specific knowledge and twice the amount of hybrid or intercloud connectivity. Often, there is a promise of a cloud broker, where a single platform can manage resources in multiple clouds and on-premises to make the cloud operations easier. The challenge with this school of thought is that these cloud brokers are really just exposing the lowest common denominator of the cloud vendors, typically instances, storage, load balancers, and so on, and do not have the ability to allow use of the most innovative services from the chosen cloud vendors. This will stifle cloud native architecture innovation and force the company into a similar operating model as they used before the cloud, often paying another company to manage the environments for them and not gaining much from their cloud journey.

Another common approach to multi-cloud is the use of containers for moving workloads between clouds. In theory, this approach works and solves a lot of the challenges that multi-cloud poses. There is currently a lot of innovation going on with this approach and the ability to be successful with moving containers between clouds is still in its infancy. As additional frameworks, tools, and maturity level appear, this is an area that could promise a new way to create cutting edge cloud native architectures.

Companies that are in their cloud native journey and are considering a multi-cloud approach should ask themselves the reasons why this is being considered. The authors of this book would argue that organizations would gain more speed and efficiency in the early and middle parts of their journey if they choose a single cloud vendor and focus all of their re-tooling, efforts, and people on that, versus trying to add a second cloud into the design. Ultimately, choose the path that will best serve the needs of the business and that will fit culturally into the organization.

Application migration at scale

Companies will start off the journey with the decision to be a cloud-first organization and the creation of a DevOps team, and will then continue with choosing a cloud vendor and setting up the target cloud-operating foundation. Soon after these activities are complete, the time to scale-out and ramp up migrations begins. A cloud native company will have the goal of reducing their self-managed data centers and workloads and shifting those as much as possible to the cloud. There are three main paths this can present:

Lift-and-shift migration of legacy workloads to the cloud
Re-engineering of legacy workloads to optimize in the cloud
Greenfield cloud native development

For most large enterprise companies, all three of these options will take place with different parts of the legacy workloads. For smaller companies, any mix of the three could be employed, depending on the outcomes being sought.

Lift-and-shift migration

Lift-and-shift migration is the act of moving existing workloads, as is, to the target cloud-operating foundation already implemented. This type of exercise is usually done against a grouping of applications, by business unit, technology stack, or complexity level of some other type of metric. A lift-and-shift migration in its purest form is literally bit-by-bit copies of existing instances, databases, storage, and so on, and is actually rare, since the cost benefits of doing this to the cloud would be negligible. For example, moving 100 instances from an on-premises data center to the cloud, with no changes to size or taking into consideration scaling options, would most likely result in a higher cost to the company.

The more common derivative of a lift-and-shift is a lift-tinker-shift migration, where the majority of the workloads are moved; however, specific components are upgraded or swapped out for cloud services. For example, moving 100 instances from an on-premises data center to the cloud, but standardizing on a specific operating system (for example, Red Hat Enterprise Edition), moving all databases into a cloud vendor managed service (for example, Amazon Relational Database Service), and storing backup or archive files in a cloud storage blog storage (for example, Amazon Simple Storage Service) would constitute a lift-tinker-shift migration. This type of migration would most likely save the company a lot of money for the business case, take advantage of some of the most mature services in the cloud, and allow for significant long-term advantages with future deployments.

Re-engineer migration

Companies that are truly moving to be a cloud native organization will most likely choose to re-engineer most of their legacy workloads so that they can take advantage of the scale and innovation that the cloud has to offer. Workloads that are chosen to be migrated to the cloud but re-engineered in the process might take longer to move, but once completed they will fall on some part of the CNMM and be considered cloud native. These types of migrations are not quite greenfield development projects, but are also not lift-and-shift migrations either; they are designed to have significant portions of the application workloads rewritten or replatformed, so they fit the cloud native standards. For example, a composite application contains 100 instances using a traditional SOA architecture, containing five different distinct workloads with an ESB to mediate traffic. To re-engineer this composite application, the company would decide to remove the ESB, break the distinct workloads into more function-based microservices, remove as many instances as possible by leveraging serverless cloud services, and reformat the database to be NoSQL instead of relational.

Migrating workloads using a re-engineering approach is a good way for the trail blazers of a company's DevOps team to create a significant project, dive deep into the designing of the architecture, and employ all new skills and techniques for their cloud native journey. We believe that, over time, the majority of migration projects will be re-engineering existing workloads to take advantage of cloud computing.

Cloud native companies

While technically not a migration, cloud native companies that are creating new applications will choose to go through the entire development cycle with a cloud native architecture in mind. Even workloads that are re-engineered might not be able to fully change their underlying technologies for whatever reason. When a company chooses to go full cloud native development, all legacy approaches to development, scale constraints, slow deployments, and process and legacy-skilled workers are removed, and only the latest and greatest cloud services, architectures, and techniques are employed. Companies that have gotten to this phase of the journey are truly cloud native, and are set up for long-term success with how they develop and deploy business applications.