Cloud Computing | 121 articles | Tech News, Tutorials & Expert Insights

04 Jan 2017

16 min read

Deploying First Server

04 Jan 2017

In this article by Kirill Shirinkin, the author of the book Getting Started with Terraform, we will know how we can proceed to learning how exactly Terraform works and how to use it. In this we will learn a bit about Terraform history, install it on our workstation, prepare working environment and run the tool for the first time. After having everything ready for work we will figure out what is a Terraform provider and then we will take a quick tour of what AWS and EC2 is. With this knowledge in place, we will first create an EC2 instance by hand (just to understand the pain that Terraform will eliminate) and then we will do exactly the same with the help of Terraform template. That will allow us to study the nature of Terraform state file. (For more resources related to this topic, see here.) History of Terraform Terraform was first released in July 2014 by a company called Hashicorp. That is the same company that brought us tools like Vagrant, Packer, Vault and some others. Being the fifth tools in Hashicorp stack, it was focused on describing the complete infrastructure as code. … From physical servers to containers to SaaS products, Terraform is able to create and compose all the components necessary to run any service or application. With Terraform, you describe your complete infrastructure as code, even as it spans multiple service providers. Your servers may come from AWS, your DNS may come from CloudFlare, and your database may come from Heroku. Terraform will build all these resources across all these providers in parallel. Terraform codifies knowledge about your infrastructure unlike any other tool before, and provides the workflow and tooling for safely changing and updating infrastructure. - https://www.hashicorp.com/blog/terraform.html Terraform is an open source tool released under Mozilla public license, version 2.0. The code is stored (as all other tools by Hashicorp) on GitHub and anyone can contribute to its development. As part of its Atlas product Hashicorp also offers hosted Terraform Enterprise services, which solves some of the problems, which open source version doesn't. This includes central facility to run Terraform from, access control policies, remote state file storage, notifications, built-in GitHub integration and more. Despite support of over 40 various providers, the main focus of Hashicorp developers is on Amazon Web Services, Google Cloud and Microsoft Azure. All other providers are developed and supported by community, meaning that if you are not using the main three than you might have to contribute to the codebase yourself. The code of Terraform is written in Go programming language and is released as a single binary for all major operating systems. Windows, Mac OS X, FreeBSD, OpenBSD, Salaris and any Linux are supported in both 32-bit and 64-bit versions. Terraform is still a relatively new piece of tech, being just a bit over two years old. It changes a lot over time and gets new features with every release. After learning these facts, let's finally proceed to installing Terraform and setting up our workplace. Preparing work environment In this we will focus on using Terraform in a Linux environment. The general usage of the tool should be the same on all platforms, though some advanced topics and practices. As mentioned in previous section, Terraform is distributed as a single binary, packaged inside Zip archive. Unfortunately, Hashicorp does not provide native packages for operating systems. That means the first step is too install unzip. Depending on your package manager it could be done by running sudo yum install unzip or sudo apt-get install unzip or might be even already installed. In any case, after making sure that you can un-archive Zip files, proceed to downloading Terraform from official website: https://www.terraform.io/downloads.html. Unzip it to any convenient folder. Make sure that this folder is available in your PATH environment variable. Full installation commands sequence could look like this: $> curl -O https://releases.hashicorp.com/terraform/0.7.2/terraform_0.7.2_linux_amd64.zip $> sudo unzip terraform_0.7.2_linux_amd64.zip -d /usr/local/bin/ That will extract Terraform binary to /usr/local/bin, which is already available in PATH on Linux systems. Finally, let's verify our installation: $> terraform -v Terraform v0.7.2 We have a working Terraform installation now. We are ready to write our first template. First, create an empty directory and name it packt-terraform and enter it: $> mkdir packt-terraform && cd packt-terraform When you run terraform commands it looks for files with .tf extension in a directory you run it from. Be careful: Terraform will load all files with .tf extension if you run it without arguments. Let's create our very first, not very useful yet template: $> touch template.tf To apply template you need to run terraform apply command. What does this applying mean? In Terraform, when you run apply, it will read your templates and it will try to create an infrastructure exactly as it's defined in your templates. For now, let's just apply our empty template: $> terraform apply Apply complete! Resources: 0 added, 0 changed, 0 destroyed. After each run is finished you get a number of resources that you've added, changed and destroyed. In this case, it did nothing, as we just have an empty file instead of a real template. To make Terraform do something useful we first need to configure our provider, and even before that we need to find out what is provider. What are the many Terraform Providers Provider is something you use to configure access to the service you create resources for. For example, if you want to create AWS resources, you need to configure AWS provider, which would specify credentials to access APIs of many AWS services. At the moment of writing Terraform has 43 providers. This impressive list includes not only major cloud providers like AWS and Google Cloud, but also a smaller services, like Fastly, a Content Delivery Network (CDN) provider. Not every provider requires explicit configuration. Some of them do not even deal with external services. Instead, they provide resources for local entities. For example, you could use TLS provider to generate keys and certificates. But still, most of providers are dealing with one or another external API and requires configuration. In this we will be using AWS provider. Before we configure it, let's have a short introduction to AWS. If you are already familiar to this platform, feel free to skip next session and proceed directly to Configuring AWS provider. Short introduction to AWS Amazon Web Services is a cloud offering from Amazon, an online retail giant. Back in early 2000s, Amazon invested money into an automated platform, which would provide services for things like network, storage and computation to developers. Developers then don't need to manage underlying infrastructure. Instead, they would use provided services via APIs to provision virtual machines, storage buckets and so on. The platform, initially built to power Amazon itself, was open for public usage in 2006. The first two services: Simple Storage Service (S3) and Elastic Compute Cloud (EC2) were released and anyone could pay for using them. Fast forward 10 years. AWS has now over 70 different services, covering practically everything modern infrastructure would need. It has services for virtual networking, queue processing, transactional emails, storage, DNS, relational databases and many many others. Businesses like Netflix completely moved away from in-house hardware and instead are building new type of infrastructure on top of cloud resources, getting significant benefits in terms of flexibility and cost-savings and focusing on working on a product, rather than scaling and maturing own data center. With such an impressive list of services, it becomes increasingly hard to juggle all involved components via AWS Management Console: in-browser interface for working with AWS. Of course, AWS provides APIs for every service it has, but ones again: the number and intersection of them can be very high, and it only grows as you keep relying on the cloud. This led exactly to the problems you end either with intense ClickOps practices or you script everything you can. These problems make AWS perfect candidate for exploring Terraform, as we can fully understand the pain caused by direct usage of its services. Of course, AWS is not free to use, but luckily for a long time now they provide Free Tier. Free Tier allows you to use lots of (but not all) services for free with certain limitations. For example, you can use single EC2 instance for 750 hours a month for 12 months for free, as long as it has t2.micro type. EC2 instances are, simply, virtual servers. You pay for them per-hour of usage, and you can choose from a pre-defined list of types. Types are just different combinations of characteristics. Some are optimized for high memory usage, others were created for processor-heavy tasks. Let's create a brand new AWS account for our Terraform learning goals, as following: Open https://aws.amazon.com/free/ and click Create a free account. Follow on-screen instructions to complete registration. Please notice, that in order to use Free Tier you have to provide your credit card details. But you won't be charged unless you exceed your free usage limit. Using Elastic Compute Cloud Creating an instance through management console Just to get a feel of AWS Management Console and to fully understand how much Terraform simplifies working with AWS, let's create a single EC2 instance manually. Login to the console and choose EC2 from the list of services: Click on Launch Instance: Choose AWS Marketplace from left sidebar, type Centos in search box and click Select button for first search result: On each of the next pages just click Next till you reach the end of the process. As you see, it's not really a fast process to create a single virtual server on EC2. You have to choose AMI, instance type, configure network details and permissions, select or generate an SSH-key, properly tag it, pick right security groups and add storage. Imagine, if your day would consist only of manual tasks like this. What a boring job would it be? AMI is a source image, an instance is created from. You can create your own AMIs, use the ones provided by AWS or select one from community at AWS Marketplace. Security Group (SG) is like a Firewall. You can attach multiple SGs to an instance and define inbound and outbound rules. It allows you to configure access not only for IP ranges, but also for other security groups. And, of course, we looked at only a single service, EC2. And as you know already, there are over 70 of them, each with its own interface to click through. Let's take a look now how to achieve the same with AWS CLI. Creating an instance with AWS CLI AWS provides a CLI to interact with its APIs. It's written in Python. You can follow installation instructions from the official guide to get started: https://aws.amazon.com/cli/ Perhaps, the most important part of setting up AWS CLI is to configure access keys. We will also need these keys for Terraform. To get them, click on your username in top right part of AWS Management Console, click on Security Credentials and then download your keys from Access Keys (Access Key ID and Secret Access Key) menu. Warning: using root account access keys is considered a bad practice when working with AWS. You should use IAM users and per-user keys. For the needs of this root keys are okay, but as soon as you move production systems to AWS, consider using IAM and reduce root account usage to minimum. Once AWS CLI is installed, run aws configure command. It will prompt you for your access keys and region. Once you are finished, you can use it to talk to AWS API. Creating an EC2 instance will look like this: $> aws ec2 run-instances --image-id ami-xxxxxxxx --count 1 --instance-type t2.micro --key-name MyKeyPair --security-groups my-sg While already much better, than doing it from Management Console, it's still a long command to execute, and it covers only creation. For tracking, if instance is still there, updating it and destroying you need to construct similar long from command line calls. Let's finally do it properly: with Terraform. Configuring AWS Provider Before using Terraform to create an instance we need to configure AWS provider. This is the first piece of code we will write in our template. Templates are written in special language called Hashicorp Configuration Language (HCL) https://github.com/hashicorp/hcl. You can also write your templates in JSON, but it is recommended only if template itself is generated by a machine. There are four different ways to configure credentials: Static credentials With this method, you just hard-code your access keys write inside your template. It looks like this: provider "aws" { access_key = "xxxxxxxxxxxxx" secret_key = "xxxxxxxxxxxxx" region = "us-east-1" } Though the simplest one, it is also a least flexible and secured one. You don't want to give your credentials just like this to everyone in the team. Rather, each team member should use his or her own keys. Environment variables If not specified in the template, Terraform will try to read configuration from environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. You can also set your region with AWS_DEFAULT_REGION variable. In this case, complete configuration goes down to: provider "aws" {} Credentials file If Terraform won't find keys in the template or environment variables, it will try to fetch them from credentials file, which is typically stored in ~/.aws/credentials. If you previously installed and configured AWS CLI, then you already have credentials file generated for you. If you did not do it, then you can add it yourself, with content like this: [default] aws_access_key_id = xxxxxxxxxxxxx aws_secret_access_key = xxxxxxxxxxxxx You should always avoid setting credentials directly in the template. It's up to you if you use environment variables or credentials file. Whichever method you picked, let's add following configuration to template.tf: provider "aws" { region = "eu-central-1" } Running terraform apply still won't do anything, because we did not specify any resources we want our infrastructure to have. Let's do it. Creating EC2 instance with Terraform Resources are components of your infrastructure. It can be something as complex as complete virtual server, or something as simple as DNS record. Each resource belongs to a provider and type of the resource is suffixed with provider name. Configuration of a resource takes this form then: resource "provider-name_resource-type" "resource-name" { parameter_name = parameter_value } There are three types of things you can configure inside resource block: resource-specific parameters, meta-parameters and provisioners. For now, let's focus on resource-specific parameters. They are unique to each resource type. We will create an EC2 instance, which is created with aws_instance resource. To create an instance we need to set at least two parameters: ami and instance_type. Some parameters are required, while others are optional, ami and instance_type being the required ones. You can always check the complete list of available parameters in the docs, on the page dedicated to the particular resource. For example, to get the list and description of all aws_instance resource parameters check out https://www.terraform.io/docs/providers/aws/r/instance.html. We'll be using official Centos 7 AMI. As we configured AWS region to eu-central-1, then need AMI with id ami-9bf712f4. We will use t2.micro instance type, as it's the cheapest one and available as part of Free Tier offering. Update the template to look like this: # Provider configuration provider "aws" { region = "eu-central-1" } # Resource configuration resource "aws_instance" "hello-instance" { ami = "ami-378f925b" instance_type = "t2.micro" tags { Name = "hello-instance" } } You might also need to specify subnet_id parameter, in case you don't have a default VPC. For this you will need to create a VPC and a subnet. You can either do it now yourself. As you noticed, HCL allows commenting your code using hash sign in front of the text you want to be commented. There is another thing to look at: tags parameter. Terraform is not limited to simple string values. You can also have numbers, boolean values (true, false), lists (["elem1", "elem2", "elem3"])and maps. tags parameter is a map of tags for the instance. Let's apply this template! $> terraform apply aws_instance.hello-instance: Creating... ami: "" => "ami-378f925b" < ……………………. > instance_type: "" => "t2.micro" key_name: "" => "<computed>" < ……………………. > tags.%: "" => "1" tags.Name: "" => "hello-instance" tenancy: "" => "<computed>" vpc_security_group_ids.#: "" => "<computed>" aws_instance.hello-instance: Still creating... (10s elapsed) aws_instance.hello-instance: Still creating... (20s elapsed) aws_instance.hello-instance: Still creating... (30s elapsed) aws_instance.hello-instance: Creation complete Apply complete! Resources: 1 added, 0 changed, 0 destroyed. The state of your infrastructure has been saved to the path below. This state is required to modify and destroy your infrastructure, so keep it safe. To inspect the complete state use the terraform show` command. State path: terraform.tfstate Wow, that's a lot of output for a simple command creating a single instance. Some parts of it were replaced with arrows-wrapped dots, so don't be surprised when you will see even more parameters values when you actually run the command. Before digging into the output, let's first verify that the instance was really created, in AWS Management Console. With just 12 lines of code and single Terraform command invocation we got our EC2 instance running. So far the result we got is not that different from using AWS CLI though: we only created a resource. What is of more interest is how do we update and destroy this instance using the same template. And to understand how Terraform does it we need to learn what the state file is. Summary In this article we learned that how to update our server using the same template and, finally, destroy it. We will already have a solid knowledge of Terraform basics and we will be ready to template your existing infrastructure. Resources for Article: Further resources on this subject: Provision IaaS with Terraform [article] Start Treating your Infrastructure as Code [article] OpenStack Networking in a Nutshell [article]

0
0
1534

article-image-planning-failure-and-success

Packt

27 Dec 2016

24 min read

Planning for Failure (and Success)

Packt

27 Dec 2016

24 min read

In this article by Michael Solberg and Ben Silverman, the author of the book Openstack for Architects, we will be walking through how to architect your cloud to avoid hardware and software failures. The OpenStack control plane is comprised of web services, application services, database services, and a message bus. Each of these tiers require different approaches to make them highly available and some organizations will already have defined architectures for each of the services. We've seen that customers either reuse those existing patterns or adopt new ones which are specific to the OpenStack platform. Both of these approaches make sense, depending on the scale of the deployment. Many successful deployments actually implement a blend of these. For example, if your organization already has a supported pattern for highly available MySQL databases, you might chose that pattern instead of the one outlined in this article. If your organization doesn't have a pattern for highly available MongoDB, you might have to architect a new one. (For more resources related to this topic, see here.) Building a highly available control plane Back in the Folsom and Grizzly days, coming up with an high availability (H/A) design for the OpenStack control plane was something of a black art. Many of the technologies recommended in the first iterations of the OpenStack High Availability Guide were specific to the Ubuntu distribution of Linux and were unavailable on the Red Hat Enterprise Linux-derived distributions. The now-standard cluster resource manager (Pacemaker) was unsupported by Red Hat at that time. As such, architects using Ubuntu might use one set of software, those using CentOS or RHEL might use another set of software, and those using a Rackspace or Mirantis distribution might use yet another set of software. However, these days, the technology stack has converged and the H/A pattern is largely consistent regardless of the distribution used. About failure and success When we design a highly available OpenStack control plane, we're looking to mitigate two different scenarios: The first is failure. When a physical piece of hardware dies, we want to make sure that we recover without human interaction and continue to provide service to our users The second and perhaps more important scenario is success Software systems always work as designed and tested until humans start using them. While our automated test suites will try to launch a reasonable number of virtual objects, humans are guaranteed to attempt to launch an unreasonable number. Also, many of the OpenStack projects we've worked on have grown far past their expected size and need to be expanded on the fly. There are a few different types of success scenarios that we need to plan for when architecting an OpenStack cloud. First, we need to plan for a growth in the number of instances. This is relatively straightforward. Each additional instance grows the size of the database, it grows the amount of metering data in Ceilometer, and, most importantly, it will grow the number of compute nodes. Adding compute nodes and reporting puts strain on the message bus, which is typically the limiting factor in the size of OpenStack regions or cells. We'll talk more about this when we talk about dividing up OpenStack clouds into regions, cells, and Availability Zones. The second type of growth we need to plan for is an increase in the number of API calls. Deployments which support Continuous Integration(CI) development environments might have (relatively) small compute requirements, but CI typically brings up and tears down environments rapidly. This will generate a large amount of API traffic, which in turn generates a large amount of database and message traffic. In hosting environments, end users might also manually generate a lot of API traffic as they bring up and down instances, or manually check the status of deployments they've already launched. While a service catalog might check the status of instances it has launched on a regular basis, humans tend to hit refresh on their browsers in an erratic fashion. Automated testing of the platform has a tendency to grossly underestimate this kind of behavior. With that in mind, any pattern that we adopt will need to provide for the following requirements: API services must continue to be available during a hardware failure in the control plane The systems which provide API services must be horizontally scalable (and ideally elastic) to respond to unanticipated demands The database services must be vertically or horizontally scalable to respond to unanticipated growth of the platform The message bus can either be vertically or horizontally scaled depending on the technology chosen Finally, every system has its limits. These limits should be defined in the architecture documentation so that capacity planning can account for them. At some point, the control plane has scaled as far as it can and a second control plane should be deployed to provide additional capacity. Although OpenStack is designed to be massively scalable, it isn't designed to be infinitely scalable. High availability patterns for the control plane There are three approaches commonly used in OpenStack deployments these days for achieving high availability of the control plane. The first is the simplest. Take the single-node cloud controller virtualize it, and then make the virtual machine highly available using either VMware clustering or Linux clustering. While this option is simple and it provides for failure scenarios, it scales vertically (not horizontally) and doesn't provide for success scenarios. As such, it should only be used in regions with a limited number of compute nodes and a limited number of API calls. In practice, this method isn't used frequently and we won't spend any more time on it here. The second pattern provides for H/A, but not horizontal scalability. This is the "Active/Passive" scenario described in the OpenStack High Availability Guide. At Red Hat, we used this a lot with our Folsom and Grizzly deployments, but moved away from it starting with Havana. It's similar to the virtualization solution described earlier but instead of relying on VMware clustering or Linux clustering to restart a failed virtual machine, it relies on Linux clustering to restart failed services on a second cloud controller node, also running the same subset of services. This pattern doesn't provide for success scenarios in the Web tier, but can still be used in the database and messaging tiers. Some networking services may still need to be provided as Active/Passive as well. The third H/A pattern available to OpenStack architectures is the Active/Active pattern. In this pattern, services are horizontally scaled out behind a load balancing service or appliance, which is Active/Passive. As a general rule, most OpenStack services should be enabled as Active/Active where possible to allow for success scenarios while mitigating failure scenarios. Ideally, Active/Active services can be scaled out elastically without service disruption by simply adding additional control plane nodes. Both of the Active/Passive and Active/Active designs require clustering software to determine the health of services and the hosts on which they run. In this article, we'll be using Pacemaker as the cluster manager. Some architects may choose to use Keepalived instead of Pacemaker. Active/Passive service configuration In the Active/Passive service configuration, the service is configured and deployed to two or more physical systems. The service is associated with a Virtual IP(VIP)address. A cluster resource manager (normally Pacemaker) is used to ensure that the service and its VIP are enabled on only one of the two systems at any point in time. The resource manager may be configured to favor one of the machines over the other. When the machine that the service is running on fails, the resource manager first ensures that the failed machine is no longer running and then it starts the service on the second machine. Ensuring that the failed machine is no longer running is accomplished through a process known as fencing. Fencing usually entails powering off the machine using the management interface on the BIOS. The fence agent may also talk to a power supply connected to the failed server to ensure that the system is down. Some services (such as the Glance image registry) require shared storage to operate. If the storage is network-based, such as NFS, the storage may be mounted on both the active and the passive nodes simultaneously. If the storage is block-based, such as iSCSI, the storage will only be mounted on the active node and the resource manager will ensure that the storage migrates with the service and the VIP. Active/Active service configuration Most of the OpenStack API services are designed to be run on more than one system simultaneously. This configuration, the Active/Active configuration, requires a load balancer to spread traffic across each of the active services. The load balancer manages the VIP for the service and ensures that the backend systems are listening before forwarding traffic to them. The cluster manager ensures that the VIP is only active on one node at a time. The backend services may or may not be managed by the cluster manager in the Active/Active configuration. Service or system failure is detected by the load balancer and failed services are brought out of rotation. There are a few different advantages to the Active/Active service configuration, which are as follows: The first advantage is that it allows for horizontal scalability. If additional capacity is needed for a given service, a new system can be brought up which is running the service and it can be added into rotation behind the load balancer without any downtime. The control plane may also be scaled down without downtime in the event that it was over-provisioned. The second advantage is that Active/Active services have a much shorter mean time to recovery. Fencing operations often take up to 2 minutes and fencing is required before the cluster resource manager will move a service from a failed system to a healthy one. Load balancers can immediately detect system failure and stop sending requests to unresponsive nodes while the cluster manager fences them in the background. Whenever possible, architects should employ the Active/Active pattern for the control plane services. OpenStack service specifics In this section, we'll walk through each of the OpenStack services and outline the H/A strategy for them. While most of the services can be configured as Active/Active behind a load balancer, some of them must be configured as Active/Passive and others may be configured as Active/Passive. Some of the configuration is dependent on a particular version of OpenStack as well, especially, Ceilometer, Heat, and Neutron. The following details are current as of the Liberty release of OpenStack. The OpenStack web services As a general rule, all of the web services and the Horizon dashboard may be run Active/Active. These include the API services for Keystone, Glance, Nova, Cinder, Neutron, Heat, and Ceilometer. The scheduling services for Nova, Cinder, Neutron, Heat, and Ceilometer may also be deployed Active/Active. These services do not require a load balancer, as they respond to requests on the message bus. The only web service which must be run Active/Passive is the Ceilometer Central agent. This service can be configured to split its workload among multiple instances, however, to support scaling horizontally. The database services All state for the OpenStack web services is stored in a central database—usually a MySQL database. MySQL is usually deployed in an Active/Passive configuration, but can be made Active/Active with the Galera replication extension. Galera is clustering software for MySQL (MariaDB in OpenStack) and this uses synchronous replication to achieve H/A. However, even with Galera, we still recommend directing writes to only one of the replicas—some queries used by the OpenStack services may deadlock when writing to more than one master. With Galera, a load balancer is typically deployed in front of the cluster and is configured to deliver traffic to only one replica at a time. This configuration reduces the mean time to recovery of the service while ensuring that the data is consistent. In practice, many organizations will defer to the database architects for their preference regarding highly available MySQL deployments. After all, it is typically the database administration team who is responsible for responding to failures of that component. Deployments which use the Ceilometer service also require a MongoDB database to store telemetry data. MongoDB is horizontally scalable by design and is typically deployed Active/Active with at least three replicas. The message bus All OpenStack services communicate through the message bus. Most OpenStack deployments these days use the RabbitMQ service as the message bus. RabbitMQ can be configured to be Active/Active through a facility known as "mirrored queues". The RabbitMQ service is not load balanced, each service is given a list of potential nodes and the client is responsible for determining which nodes are active and which ones have failed. Other messaging services used with OpenStack such as ZeroMQ, ActiveMQ, or Qpid may have different strategies and configurations for achieving H/A and horizontal scalability. For these services, refer to the documentation to determine the optimal architecture. Compute, storage, and network agents The compute, storage, and network components in OpenStack has a set of services which perform the work which is scheduled by the API services. These services register themselves with the schedulers on start up over the message bus. The schedulers are responsible for determining the health of the services and scheduling work to active services. The compute and storage services are all designed to be run Active/Active but the network services need some extra consideration. Each hypervisor in an OpenStack deployment runs the nova-compute service. When this service starts up, it registers itself with the nova-scheduler service. A list of currently available nova services is available via the nova service-list command. If a compute node is unavailable, its state is listed as down and the scheduler skips it when performing instance actions. When the node becomes available, the scheduler includes it in the list of available hosts. For KVM or Xen-based deployments, the nova-compute service runs once per hypervisor and is not made highly available. For VMware-based deployments though, a single nova-compute service is run for every vSphere cluster. As such, this service should be made highly available in an Active/Passive configuration. This is typically done by virtualizing the service within a vSphere cluster and configuring the virtual machine to be highly available. Cinder includes a service known as the volume service or cinder-volume. The volume service registers itself with the Cinder scheduler on startup and is responsible for creating, modifying, or deleting LUNs on block storage devices. For backends which support multiple writers, multiple copies of this service may be run in Active/Active configuration. The LVM backend (which is the reference backend) is not highly available, though, and may only have one cinder-volume service for each block device. This is because the LVM backend is responsible for providing iSCSI access to a locally attached storage device. For this reason, highly available deployments of OpenStack should avoid the LVM Cinder backend and instead use a backend that supports multiple cinder-volume services. Finally, the Neutron component of OpenStack has a number of agents, which all require some special consideration for highly available deployments. The DHCP agent can be configured as highly available, and the number of agents which will respond to DHCP requests for each subnet is governed by a parameter in the neutron.conf file, dhcp_agents_per_network. This is typically set to 2, regardless of the number of DHCP agents which are configured to run in a control plane. For most of the history of OpenStack, the L3 routing agent in Neutron has been a single point of failure. It could be made highly available in Active/Passive configuration, but its failover meant the interruption of network connections in the tenant space. Many of the third-party Neutron plugins have addressed this in different ways and the reference Open vSwitch plugin has a highly available L3 agent as of the Juno release. For details on implementing a solution to the single routing point of failure using OpenStack's Distributed Virtual Routers (DVR), refer to the OpenStack Foundation's Neutron documentation at http://docs.openstack.org/liberty/networking-guide/scenario-dvr-ovs.html. Regions, cells, and availability Zones As we mentioned before, OpenStack is designed to be scalable, but not infinitely scalable. There are three different techniques architects can use to segregate an OpenStack cloud—regions, cells, and Availability Zones. In this section, we'll walk through how each of these concepts maps to hypervisor topologies. Regions From an end user's perspective, OpenStack regions are equivalent to regions in Amazon Web Services. Regions live in separate data centers and are often named after their geographical location. If your organization has a data center in Phoenix and one in Raleigh (like ours does) you'll have at least a PHX and a RDU region. Users who want to geographically disperse their workloads will place some of them in PHX and some of them in RDU. Regions have separate API endpoints, and although the Horizon UI has some support for multiple regions, they essentially entirely separate deployments. From an architectural standpoint, there are two main design choices for implementing regions, which are as follows: The first is around authorization. Users will want to have the same credentials for accessing each of the OpenStack regions. There are a few ways to accomplish this. The simplest way is to use a common backing store (usually LDAP) for the Keystone service in each region. In this scenario, the user has to authenticate separately to each region to get a token, but the credentials are the same. In Juno and later, Keystone also supports federation across regions. In this scenario, a Keystone token granted by one region can be presented to another region to authenticate a user. While this currently isn't widely used, it is a major focus area for the OpenStack Foundation and will probably see broader adoption in the future. The second major consideration for regional architectures is whether or not to present a single set of Glance images to each region. While work is currently being done to replicate Glance images across federated clouds, most organizations are manually ensuring that the shared images are consistent. This typically involves building a workflow around image publishing and deprecation which is mindful of the regional layout. Another option for ensuring consistent images across regions is to implement a central image repository using Swift. This also requires shared Keystone and Glance services which span multiple data centers. Details on how to design multiple regions with shared services are in the OpenStack Architecture Design Guide. Cells The Nova compute service has a concept of cells, which can be used to segregate large pools of hypervisors within a single region. This technique is primarily used to mitigate the scalability limits of the OpenStack message bus. The deployment at CERN makes wide use of cells to achieve massive scalability within single regions. Support for cells varies from service to service and as such cells are infrequently used outside a few very large cloud deployments. The CERN deployment is well-documented and should be used as a reference for these types of deployments. In our experience, it's much simpler to deploy multiple regions within a single data center than to implement cells to achieve large scale. The added inconvenience of presenting your users with multiple API endpoints within a geographic location is typically outweighed by the benefits of having a more robust platform. If multiple control planes are available in a geographic region, the failure of a single control plane becomes less dramatic. The cells architecture has its own set of challenges with regard to networking and scheduling of instance placement. Some very large companies that support the OpenStack effort have been working for years to overcome these hurdles. However, many different OpenStack distributions are currently working on a new control plane design. These new designs would begin to split the OpenStack control plane into containers running the OpenStack services in a microservice type architecture. This way the services themselves can be placed anywhere and be scaled horizontally based on the load. One architecture that has garnered a lot of attention lately is the Kolla project that promotes Docker containers and Ansible playbooks to provide production-ready containers and deployment tools for operating OpenStack clouds. To see more, go to https://wiki.openstack.org/wiki/Kolla. Availability Zones Availability Zones are used to group hypervisors within a single OpenStack region. Availability Zones are exposed to the end user and should be used to provide the user with an indication of the underlying topology of the cloud. The most common use case for Availability Zones is to expose failure zones to the user. To ensure the H/A of a service deployed on OpenStack, a user will typically want to deploy the various components of their service onto hypervisors within different racks. This way, the failure of a top of rack switch or a PDU will only bring down a portion of the instances which provide the service. Racks form a natural boundary for Availability Zones for this reason. There are a few other interesting uses of Availability Zones apart from exposing failure zones to the end user. One financial services customer we work with had a requirement for the instances of each line of business to run on dedicated hardware. A combination of Availability Zones and the AggregateMultiTenancyIsolation Nova Scheduler filter were used to ensure that each tenant had access to dedicated compute nodes. Availability Zones can also be used to expose hardware classes to end users. For example, hosts with faster processors might be placed in one Availability Zone and hosts with slower processors might be placed in different Availability Zones. This allows end users to decide where to place their workloads based upon compute requirements. Updating the design document In this article, we walked through the different approaches and considerations for achieving H/A and scalability in OpenStack deployments. As Cloud Architects, we need to decide on the correct approach for our deployment and then document it thoroughly so that it can be evaluated by the larger team in our organization. Each of the major OpenStack vendors has a reference architecture for highly available deployments and those should be used as a starting point for the design. The design should then be integrated with existing Enterprise Architecture and modified to ensure that best practices established by the various stakeholders within an organization are followed. The system administrators within an organization may be more comfortable supporting Pacemaker than Keepalived. The design document presents the choices made for each of these key technologies and gives the stakeholders an opportunity to comment on them before the deployment. Planning the physical architecture The simplest way to achieve H/A is to add additional cloud controllers to the deployment and cluster them. Other deployments may choose to segregate services into different host classes, which can then be clustered. This may include separating the database services into database nodes, separating the messaging services into messaging nodes, and separating the memcached service into memcache nodes. Load balancing services might live on their own nodes as well. The primary considerations for mapping scalable services to physical (or virtual) hosts are the following: Does the service scale horizontally or vertically? Will vertically scaling the service impede the performance of other co-located services? Does the service have particular hardware or network requirements that other services don't have? For example, some OpenStack deployments which use the HAProxy load balancing service chose to separate out the load balancing nodes on a separate hardware. The VIPs which the load balancing nodes host must live on a public, routed network, while the internal IPs of services that they route to don't have that requirement. Putting the HAProxy service on separate hosts allows the rest of the control plane to only have private addressing. Grouping all of the API services on dedicated hosts may ease horizontal scalability. These services don't need to be managed by a cluster resource manager and can be scaled by adding additional nodes to the load balancers without having to update cluster definitions. Database services have high I/O requirements. Segregating these services onto machines which have access to high performance fiber channel may make sense. Finally, you should consider whether or not to virtualize the control plane. If the control plane will be virtualized, creating additional host groups to host dedicated services becomes very attractive. Having eight or nine virtual machines dedicated to the control plane is a very different proposition than having eight or nine physical machines dedicated to the control plane. Most highly available control planes require at least three nodes to ensure that quorum is easily determined by the cluster resource manager. While dedicating three physical nodes to the control function of a hundred node OpenStack deployment makes a lot of sense, dedicating nine physical nodes may not. Many of the organizations that we've worked with will already have a VMware-based cluster available for hosting management appliances and the control plane can be deployed within that existing footprint. Organizations which are deploying a KVM-only cloud may not want to incur the additional operational complexity of managing the additional virtual machines outside OpenStack. Updating the physical architecture design Once the mapping of services to physical (or virtual) machines has been determined, the design document should be updated to include definition of the host groups and their associated functions. A simple example is provided as follows: Load balancer: These systems provide the load balancing services in an Active/Passive configuration Cloud controller: These systems provide the API services, the scheduling services, and the Horizon dashboard services in an Active/Active configuration Database node: These systems provide the MySQL database services in an Active/Passive configuration Messaging node: These systems provide the RabbitMQ messaging services in an Active/Active configuration Compute node: These systems act as KVM hypervisors and run the nova-compute and openvswitch-agent services Deployments which will be using only the cloud controller host group might use the following definitions: Cloud controller: These systems provide the load balancing services in an Active/Passive configuration and the API services, MySQL database services, and RabbitMQ messaging services in an Active/Active configuration Compute node: These systems act as KVM hypervisors and run the nova-compute and openvswitch-agent services After defining the host groups, the physical architecture diagram should be updated to reflect the mapping of host groups to physical machines in the deployment. This should also include considerations for network connectivity. The following is an example architecture diagram for inclusion in the design document: Summary A complete guide to implementing H/A of the OpenStack services is probably worth a book to itself. In this article we started out by covering the main strategies for making OpenStack services highly available and which strategies apply well to each service. Then we covered how OpenStack deployments are typically segmented across physical regions. Finally, we updated our documentation and implemented a few of the technologies we discussed in the lab. While walking through the main considerations for highly available deployments in this article, we've tried to emphasize a few key points: Scalability is at least as important as H/A in cluster design. Ensure that your design is flexible in case of unexpected growth. OpenStack doesn't scale forever. Plan for multiple regions. Also, it's important to make sure that the strategy and architecture that you adopt for H/A is supportable by your organization. Consider reusing existing architectures for H/A in the message bus and database layers. Resources for Article: Further resources on this subject: Neutron API Basics [article] The OpenFlow Controllers [article] OpenStack Networking in a Nutshell [article]

0
0
1085

Packt

26 Dec 2016

25 min read

Introduction to Ansible

Packt

26 Dec 2016

25 min read

In this article by Walter Bentley, the author of the book OpenStack Administration with Ansible 2 - Second Edition. This article will serve as a high-level overview of Ansible 2.0 and components that make up this open source configuration management tool. We will cover the definition of the Ansible components and their typical use. Also, we will discuss how to define variables for the roles and defining/setting facts about the hosts for the playbooks. Next, we will transition into how to set up your Ansible environment and the ways you can define the host inventory used to run your playbooks against. We will then cover some of the new components introduced in Ansible 2.0 named Blocks and Strategies. It will also review the cloud integrations natively part of the Ansible framework. Finally, the article will finish up with a working example of a playbook that will confirm the required host connectivity needed to use Ansible. The following topics are covered: Ansible 2.0 overview What are playbooks, roles, and modules? Setting up the environment Variables and facts Defining the inventory Blocks and Strategies Cloud integrations (For more resources related to this topic, see here.) Ansible 2.0 overview Ansible in its simplest form has been described as a python-based open source IT automation tool that can be used to configuremanage systems, deploy software (or almost anything), and provide orchestration to a process. These are just a few of the many possible use cases for Ansible. In my previous life as a production support infrastructure engineer, I wish such a tool would have existed. Would have surely had much more sleep and a lot less gray hairs. One thing that always stood out to me in regard to Ansible is that the developer's first and foremost goal was to create a tool that offers simplicity and maximum ease of use. In the world filled with complicated and intricate software, keeping it simple goes a long way for most IT professionals. Staying with the goal of keeping things simple, Ansible handles configuration/management of hosts solely through Secure Shell (SSH). Absolutely no daemon or agent is required. The server or workstation where you run the playbooks from only needs python and a few other packages, most likely already present, installed. Honestly, it does not get simpler than that. The automation code used with Ansible is organized into something named playbooks and roles, of which is written in YAML markup format. Ansible follows the YAML formatting and structure within the playbooks/roles. Being familiar with YAML formatting helps in creating your playbooks/roles. If you are not familiar do not worry, as it is very easy to pick up (it is all about the spaces and dashes). The playbooks and roles are in a noncomplied format making the code very simple to read if familiar with standard UnixLinux commands. There is also a suggested directory structure in order to create playbooks. This also is one of my favorite features of Ansible. Enabling the ability to review and/or use playbooks written by anyone else with little to no direction needed. It is strongly suggested that you review the Ansible playbook best practices before getting started: http://docs.ansible.com/playbooks_best_practices.html. I also find the overall Ansible website very intuitive and filled with great examples at http://docs.ansible.com. My favorite excerpt from the Ansible playbook best practices is under the Content Organization section. Having a clear understanding of how to organize your automation code proved very helpful to me. The suggested directory layout for playbooks is as follows: group_vars/ group1 # here we assign variables to particular groups group2 # "" host_vars/ hostname1 # if systems need specific variables, put them here hostname2 # "" library/ # if any custom modules, put them here (optional) filter_plugins/ # if any custom filter plugins, put them here (optional) site.yml # master playbook webservers.yml # playbook for webserver tier dbservers.yml # playbook for dbserver tier roles/ common/ # this hierarchy represents a "role" tasks/ # main.yml # <-- tasks file can include smaller files if warranted handlers/ # main.yml # <-- handlers file templates/ # <-- files for use with the template resource ntp.conf.j2 # <------- templates end in .j2 files/ # bar.txt # <-- files for use with the copy resource foo.sh # <-- script files for use with the script resource vars/ # main.yml # <-- variables associated with this role defaults/ # main.yml # <-- default lower priority variables for this role meta/ # main.yml # <-- role dependencies It is now time to dig deeper into reviewing what playbooks, roles, and modules consist of. This is where we will break down each of these component's distinct purposes. What are playbooks, roles, and modules? The automation code you will create to be run by Ansible is broken down in hierarchical layers. Envision a pyramid with its multiple levels of elevation. We will start at the top and discuss playbooks first. Playbooks Imagine that a playbook is the very topmost triangle of the pyramid. A playbook takes on the role of executing all of the lower level code contained in a role. It can also be seen as a wrapper to the roles created. We will cover the roles in the next section. Playbooks also contain other high-level runtime parameters, such as the host(s) to run the playbook against, the root user to use, and/or if the playbook needs to be run as a sudo user. These are just a few of the many playbook parameters you can add. Below is an example of what the syntax of a playbook looks like: --- # Sample playbooks structure/syntax. - hosts: dbservers remote_user: root become: true roles: - mysql-install In the preceding example, you will note that the playbook begins with ---. This is required as the heading (line 1) for each playbook and role. Also, please note the spacing structure at the beginning of each line. The easiest way to remember it is each main command starts with a dash (-). Then, every subcommand starts with two spaces and repeats the lower in the code hierarchy you go. As we walk through more examples, it will start to make more sense. Let's step through the preceding example and break down the sections. The first step in the playbook was to define what hosts to run the playbook against; in this case, it was dbservers (which can be a single host or list of hosts). The next area sets the user to run the playbook as locally, remotely, and it enables executing the playbook as sudo. The last section of the syntax lists the roles to be executed. The earlier example is similar to the formatting of the other playbooks. This format incorporates defining roles, which allows for scaling out playbooks and reusability (you will find the most advanced playbooks structured this way). With Ansible's high level of flexibility, you can also create playbooks in a simpler consolidated format. An example of such kind is as follows: --- # Sample simple playbooks structure/syntax - name: Install MySQL Playbook hosts: dbservers remote_user: root become: true tasks: - name: Install MySQL apt: name={{item}} state=present with_items: - libselinux-python - mysql - mysql-server - MySQL-python - name: Copying my.cnf configuration file template: src=cust_my.cnf dest=/etc/my.cnf mode=0755 - name: Prep MySQL db command: chdir=/usr/bin mysql_install_db - name: Enable MySQL to be started at boot service: name=mysqld enabled=yes state=restarted - name: Prep MySQL db command: chdir=/usr/bin mysqladmin -u root password 'passwd' Now that we have reviewed what playbooks are, we will move on to reviewing roles and their benefits. Roles Moving down to the next level of the Ansible pyramid, we will discuss roles. The most effective way to describe roles is the breaking up a playbook into multiple smaller files. So, instead of having one long playbook with multiple tasks defined, all handling separately related steps, you can break the playbook into individual specific roles. This format keeps your playbooks simple and leads to the ability to reuse roles between playbooks. The best advice I personally received concerning creating roles is to keep them simple. Try to create a role to do a specific function, such as just installing a software package. You can then create a second role to just do configurations. In this format, you can reuse the initial installation role over and over without needing to make code changes for the next project. The typical syntax of a role can be found here and would be placed into a file named main.yml within the roles/<name of role>/tasks directory: --- - name: Install MySQL apt: name={{item}} state=present with_items: - libselinux-python - mysql - mysql-server - MySQL-python - name: Copying my.cnf configuration file template: src=cust_my.cnf dest=/etc/my.cnf mode=0755 - name: Prep MySQL db command: chdir=/usr/bin mysql_install_db - name: Enable MySQL to be started at boot service: name=mysqld enabled=yes state=restarted - name: Prep MySQL db command: chdir=/usr/bin mysqladmin -u root password 'passwd' The complete structure of a role is identified in the directory layout found in the Ansible Overview section of this article. We will review additional functions of roles as we step through the working examples. With having covered playbooks and roles, we are prepared to cover the last topic in this session, which are modules. Modules Another key feature of Ansible is that it comes with predefined code that can control system functions, named modules. The modules are executed directly against the remote host(s) or via playbooks. The execution of a module generally requires you to pass a set number of arguments. The Ansible website (http://docs.ansible.com/modules_by_category.html) does a great job of documenting every available module and the possible arguments to pass to that module. The documentation for each module can also be accessed via the command line by executing the command ansible-doc <module name>. The use of modules will always be the recommended approach within Ansible as they are written to avoid making the requested change to the host unless the change needs to be made. This is very useful when re-executing a playbook against a host more than once. The modules are smart enough to know not to re-execute any steps that have already completed successfully, unless some argument or command is changed. Another thing worth noting is with every new release of Ansible additional modules are introduced. Personally, there was an exciting addition to Ansible 2.0, and these are the updated and extended set of modules set to ease the management of your OpenStack cloud. Referring back to the previous role example shared earlier, you will note the use of various modules. The modules used are highlighted here again to provide further clarity: --- - name: Install MySQL apt: name={{item}} state=present with_items: - libselinux-python - mysql - mysql-server - MySQL-python - name: Copying my.cnf configuration file template: src=cust_my.cnf dest=/etc/my.cnf mode=0755 - name: Prep MySQL db command: chdir=/usr/bin mysql_install_db - name: Enable MySQL to be started at boot service: name=mysqld enabled=yes state=restarted ... Another feature worth mentioning is that you are able to not only use the current modules, but you can also write your very own modules. Although the core of Ansible is written in python, your modules can be written in almost any language. Underneath it, all the modules technically return JSON format data, thus allowing for the language flexibility. In this section, we were able to cover the top two sections of our Ansible pyramid, playbooks, and roles. We also reviewed the use of modules, that is, the built-in power behind Ansible. Next, we transition into another key features of Ansible—variable substitution and gathering host facts. Setting up the environment Before you can start experimenting with Ansible, you must install it first. There was no need in duplicating all the great documentation to accomplish this already created on http://docs.ansible.com/ . I would encourage you to go to the following URL and choose an install method of your choice: http://docs.ansible.com/ansible/intro_installation.html. If you are installing Ansible on Mac OS, I found using Homebrew was much simpler and consistent. More details on using Homebrew can be found at http://brew.sh. The command to install Ansible with Homebrew is brew install ansible. Upgrading to Ansible 2.0 It is very important to note that in order to use the new features part of Ansible version 2.0, you must update the version running on your OSA deployment node. The version currently running on the deployment node is either 1.9.4 or 1.9.5. The method that seemed to work well every time is outlined here. This part is a bit experimental, so please make a note of any warnings or errors incurred. From the deployment node, execute the following commands: $ pip uninstall -y ansible $ sed -i 's/^export ANSIBLE_GIT_RELEASE.*/export ANSIBLE_GIT_RELEASE=${ANSIBLE_GIT_RELEASE:-"v2.1.1.0-1"}/' /opt/openstack-ansible/scripts/bootstrap-ansible.sh $ cd /opt/openstack-ansible $ ./scripts/bootstrap-ansible.sh New OpenStack client authenticate Alongside of the introduction of the new python-openstackclient, CLI was the unveiling of the os-client-config library. This library offers an additional way to provide/configure authentication credentials for your cloud. The new OpenStack modules part of Ansible 2.0 leverages this new library through a package named shade. Through the use of os-client-config and shade, you can now manage multiple cloud credentials within a single file named clouds.yml. When deploying OSA, I discovered that shade will search for this file in the $HOME/.config/openstack/ directory wherever the playbook/role and CLI command is executed. A working example of the clouds.yml file is shown as follows: # Ansible managed: /etc/ansible/roles/openstack_openrc/templates/clouds.yaml.j2 modified on 2016-06-16 14:00:03 by root on 082108-allinone02 clouds: default: auth: auth_url: http://172.29.238.2:5000/v3 project_name: admin tenant_name: admin username: admin password: passwd user_domain_name: Default project_domain_name: Default region_name: RegionOne interface: internal identity_api_version: "3" Using this new authentication method drastically simplifies creating automation code to work on an OpenStack environment. Instead of passing a series of authentication parameters in line with the command, you can just pass a single parameter, --os-cloud=default. The Ansible OpenStack modules can also use this new authentication method. More details about os-client-config can be found at: http://docs.openstack.org/developer/os-client-config. Installing shade is required to use the Ansible OpenStack modules in version 2.0. Shade will be required to be installed directly on the deployment node and the Utility container (if you decide to use this option). If you encounter problems installing shade, try the command—pip install shade—isolated. Variables and facts Anyone who has ever attempted to create some sort of automation code, whether be via bash or Perl scripts, knows that being able to define variables is an essential component. Although Ansible does not compare with other programming languages mentioned, it does contain some core programming language features such as variable substitution. Variables To start, let's first define the meaning of variables and use in the event this is a new concept. Variable (computer science), a symbolic name associated with a value and whose associated value may be changed Using variable allows you to set a symbolic placeholder in your automation code that you can substitute values for on each execution. Ansible accommodates defining variables within your playbooks and roles in various ways. When dealing with OpenStack and/or cloud technologies in general being able to adjust your execution parameters on the fly is critical. We will step through a few ways how you can set variable placeholders in your playbooks, how to define variable values, and how you can register the result of a task as a variable. Setting variable placeholders In the event you wanted to set a variable placeholder within your playbooks, you would add the following syntax like this: - name: Copying my.cnf configuration file template: src=cust_my.cnf dest={{ CONFIG_LOC }} mode=0755 In the preceding example, the variable CONFIG_LOC was added in the place of the configuration file location (/etc/my.cnf) designated in the earlier example. When setting the placeholder, the variable name must be encased within {{ }} as shown in the example. Defining variable values Now that you have added the variable to your playbook, you must define the variable value. This can be done easily by passing command-line values as follows: $ ansible-playbook base.yml --extra-vars "CONFIG_LOC=/etc/my.cnf" Or you can define the values directly in your playbook, within each role or include them inside of global playbook variable files. Here are the examples of the three options. Define a variable value directly in your playbook by adding the vars section: --- # Sample simple playbooks structure/syntax - name: Install MySQL Playbook hosts: dbservers ... vars: CONFIG_LOC: /etc/my.cnf ... Define a variable value within each role by creating a variable file named main.yml within the vars/ directory of the role with the following contents: --- CONFIG_LOC: /etc/my.cnf To define the variable value inside of the global playbook, you would first create a host-specific variable file within the group_vars/ directory in the root of the playbook directory with the exact same contents as mentioned earlier. In this case, the variable file must be named to match the host or host group name defined within the hosts file. As in the earlier example, the host group name is dbservers; in turn, a file named dbservers would be created within the group_vars/ directory. Registering variables The situation at times arises when you want to capture the output of a task. Within the process of capturing the result you are in essence registering a dynamic variable. This type of variable is slightly different from the standard variables we have covered so far. Here is an example of registering the result of a task to a variable: - name: Check Keystone process shell: ps -ef | grep keystone register: keystone_check The registered variable value data structure can be stored in a few formats. It will always follow a base JSON format, but the value can be stored under different attributes. Personally, I have found it difficult at times to blindly determine the format, The tip given here will save you hours of troubleshooting. To review and have the data structure of a registered variable returned when running a playbook, you can use the debug module, such as adding this to the previous example: - debug: var=keystone_check. Facts When Ansible runs a playbook, one of the first things it does on your behalf is gather facts about the host before executing tasks or roles. The information gathered about the host will range from the base information such as operating system and IP addresses to the detailed information such as the hardware type/resources. The details capture on then stored into a variable named facts. You can find a complete list of available facts on the Ansible website at: http://docs.ansible.com/playbooks_variables.html#information-discovered-from-systems-facts. You have the option to disable the facts gather process by adding the following to your playbook: gather_facts: false. Facts about a host are captured by default unless the feature is disabled. A quick way of viewing all facts associated with a host, you can manually execute the following via a command line: $ ansible dbservers –m setup There is plenty more you can do with facts, and I would encourage you to take some time reviewing them in the Ansible documentation. Next, we will learn more about the base of our pyramid, the host inventory. Without an inventory of hosts to run the playbooks against, you would be creating the automation code for nothing. So to close out this artticle, we will dig deeper into how Ansible handles host inventory whether it be in a static and/or dynamic format. Defining the inventory The process of defining a collection of hosts to Ansible is named the inventory. A host can be defined using its fully qualified domain name (FQDN), local hostname, and/or its IP address. Since Ansible uses SSH to connect to the hosts, you can provide any alias for the host that the machine where Ansible is installed can understand. Ansible expects the inventory file to be in an INI-like format and named hosts. By default, the inventory file is usually located in the /etc/ansible directory and will look as follows: athena.example.com [ocean] aegaeon.example.com ceto.example.com [air] aeolus.example.com zeus.example.com apollo.example.com Personally I have found the default inventory file to be located in different places depending on the operating system Ansible is installed on. With that point, I prefer to use the –i command-line option when executing a playbook. This allows me to designate the specific hosts file location. A working example would look like this: ansible-playbook -i hosts base.yml. In the preceding example, there is a single host and a group of hosts defined. The hosts are grouped together into a group by defining a group name enclosed in [ ] inside the inventory file. Two groups are defined in the earlier-mentioned example—ocean and air. In the event where you do not have any hosts within your inventory file (such as in the case of running a playbook locally only), you can add the following entry to define localhost like this: [localhost] localhost ansible_connection=local The option exists to define variable for hosts and a group inside of your inventory file. More information on how to do this and additional inventory details can be found on the Ansible website at http://docs.ansible.com/intro_inventory.html. Dynamic inventory It seemed appropriate since we are automating functions on a cloud platform to review yet another great feature of Ansible, which is the ability to dynamically capture an inventory of hosts/instances. One of the primary principles of cloud is to be able to create instances on demand directly via an API, GUI, CLI, and/or through automation code, like Ansible. That basic principle will make relying on a static inventory file pretty much a useless choice. This is why, you will need to rely heavily on dynamic inventory. A dynamic inventory script can be created to pull information from your cloud at runtime and then, in turn, use that information for the playbooks execution. Ansible provides the functionality to detect if an inventory file is set as an executable and if so will execute the script to pull current time inventory data. Since creating an Ansible dynamic inventory script is considered more of an advanced activity, I am going to direct you to the Ansible website, (http://docs.ansible.com/intro_dynamic_inventory.html), as they have a few working examples of dynamic inventory scripts there. Fortunately, in our case, we will be reviewing an OpenStack cloud built using openstack-ansible (OSA) repository. OSA comes with a prebuilt dynamic inventory script that will work for your OpenStack cloud. That script is named dynamic_inventory.py and can be found within the playbooks/inventory directory located in the root OSA deployment folder. First, execute the dynamic inventory script manually to become familiar with the data structure and group names defined (this example assumes that you are in the root OSA deployment directory): $ cd playbooks/inventory $ ./dynamic_inventory.py This will print to the screen an output similar to this: ... }, "compute_all": { "hosts": [ "compute1_rsyslog_container-19482f86", "compute1", "compute2_rsyslog_container-dee00ea5", "compute2" ] }, "utility_container": { "hosts": [ "infra1_utility_container-c5589031" ] }, "nova_spice_console": { "hosts": [ "infra1_nova_spice_console_container-dd12200f" ], "children": [] }, ... Next, with this information, you now know that if you wanted to run a playbook against the utility container, all you would have to do is execute the playbook like this: $ ansible-playbook -i inventory/dynamic_inventory.py playbooks/base.yml –l utility_container Blocks & Strategies In this section, we will cover two new features added to version 2.0 of Ansible. Both features add additional functionality to how tasks are grouped or executed within a playbook. So far, they seem to be really nice features when creating more complex automation code. We will now briefly review each of the two new features. Blocks The Block feature can simply be explained as a way of logically grouping tasks together with the option of also applying customized error handling. It gives the option to group a set of tasks together establishing specific conditions and privileges. An example of applying the block functionality to an earlier example can be found here: --- # Sample simple playbooks structure/syntax - name: Install MySQL Playbook hosts: dbservers tasks: - block: - apt: name={{item}} state=present with_items: - libselinux-python - mysql - mysql-server - MySQL-python - template: src=cust_my.cnf dest=/etc/my.cnf mode=0755 - command: chdir=/usr/bin mysql_install_db - service: name=mysqld enabled=yes state=restarted - command: chdir=/usr/bin mysqladmin -u root password 'passwd' when: ansible_distribution == 'Ubuntu' remote_user: root become: true Additional details on how to implement Blocks and any associated error handling can be found at http://docs.ansible.com/ansible/playbooks_blocks.html. Strategies The strategy feature allows you to add control on how a play is executed by the hosts. Currently, the default behavior is described as being the linear strategy, where all hosts will execute each task before any host moves on to the next task. As of today, the two other strategy types that exist are free and debug. Since Strategies are implemented as a new type of plugin to Ansible more can be easily added by contributing code. Additional details on Strategies can be found at http://docs.ansible.com/ansible/playbooks_strategies.html. A simple example of implementing a strategy within a playbook is as follows: --- # Sample simple playbooks structure/syntax - name: Install MySQL Playbook hosts: dbservers strategy: free tasks: ... The new debug strategy is extremely helpful when you need to step through your playbook/role to find something like a missing variable, determine what variable value to supply or figure out why it may be sporadically failing. These are just a few of the possible use cases. Definitely I encourage you to give this feature a try. Here is the URL to more details on the playbook debugger: http://docs.ansible.com/ansible/playbooks_debugger.html. Cloud integrations Since cloud automation is the main and most important theme of this article, it only makes sense that we highlight the many different cloud integrations Ansible 2.0 offers right out of the box. Again, this was one of the reasons why I immediately fell in love with Ansible. Yes, the other automation tools also have hooks into many of the cloud providers, but I found at times they did not work or were not mature enough to leverage. Ansible has gone above and beyond to not fall into that trap. Not saying Ansible has all the bases covered, it does feel like most are and that is what matters most to me. If you have not checked out the cloud modules available for Ansible, take a moment now and take a look at http://docs.ansible.com/ansible/list_of_cloud_modules.html. From time to time check back as I am confident, you will be surprised to find more have been added. I am very proud of my Ansible family for keeping on top of these and making it much easier to write automation code against our clouds. Specific to OpenStack, a bunch of new modules have been added to the Ansible library as of version 2.0. The extensive list can be found at http://docs.ansible.com/ansible/list_of_cloud_modules.html#openstack. You will note that the biggest changes, from the first version of this book to this one, will be focused on using as many of the new OpenStack modules when possible. Summary Let's pause here on exploring the dynamic inventory script capabilities and continue to build upon it as we dissect the working examples. We will create our very first OpenStack administration playbook together. We will start off with a fairly simple task of creating users and tenants. This will also include reviewing a few automation considerations you will need to keep in mind when creating automation code for OpenStack. Ready? Ok, let's get started! Resources for Article: Further resources on this subject: AIO setup of OpenStack – preparing the infrastructure code environment [article] RDO Installation [article] Creating Multiple Users/Tenants [article]

0
0
2595

Packt

14 Dec 2016

9 min read

Provision IaaS with Terraform

Packt

14 Dec 2016

9 min read

In this article by Stephane Jourdan and Pierre Pomes, the authors of Infrastructure as Code (IAC) Cookbook, the following sections will be covered: Configuring the Terraform AWS provider Creating and using an SSH key pair to use on AWS Using AWS security groups with Terraform Creating an Ubuntu EC2 instance with Terraform (For more resources related to this topic, see here.) Introduction A modern infrastructure often usesmultiple providers (AWS, OpenStack, Google Cloud, Digital Ocean, and many others), combined with multiple external services (DNS, mail, monitoring, and others). Many providers propose their own automation tool, but the power of Terraform is that it allows you to manage it all from one place, all using code. With it, you can dynamically create machines at two IaaS providers depending on the environment, register their names at another DNS provider, and enable monitoring at a third-party monitoring company, while configuring the company GitHub account and sending the application logs to an appropriate service. On top of that, it can delegate configuration to those who do it well (configuration management tools such as Chef, Puppet, and so on),all with the same tool. The state of your infrastructure is described, stored, versioned, and shared. In this article, we'll discover how to use Terraform to bootstrap a fully capable infrastructure on Amazon Web Services (AWS), deploying SSH key pairs and securing IAM access keys. Configuring the Terraform AWS provider We can use Terraform with many IaaS providers such as Google Cloud or Digital Ocean. Here we'll configure Terraform to be used with AWS. For Terraform to interact with an IaaS, it needs to have a provider configured. Getting ready To step through this section, you will need the following: An AWS account with keys A working Terraform installation An empty directory to store your infrastructure code An Internet connection How to do it… To configure the AWS provider in Terraform, we'll need the following three files: A file declaring our variables, an optional description, and an optional default for each (variables.tf) A file setting the variables for the whole project (terraform.tfvars) A provider file (provider.tf) Let's declare our variables in the variables.tf file. We can start by declaring what's usually known as the AWS_DEFAULT_REGION, AWS_ACCESS_KEY_ID, and AWS_SECRET_ACCESS_KEY environment variables: variable "aws_access_key" { description = "AWS Access Key" } variable "aws_secret_key" { description = "AWS Secret Key" } variable "aws_region" { default = "eu-west-1" description = "AWS Region" } Set the two variables matching the AWS account in the terraform.tfvars file. It's not recommended to check this file into source control: it's better to use an example file instead (that is: terraform.tfvars.example). It's also recommended that you use a dedicated Terraform user for AWS, not the root account keys: aws_access_key = "< your AWS_ACCESS_KEY >" aws_secret_key = "< your AWS_SECRET_KEY >" Now, let's tie all this together into a single file—provider.tf: provider "aws" { access_key = "${var.aws_access_key}" secret_key = "${var.aws_secret_key}" region = "${var.aws_region}" } Apply the following Terraform command: $ terraform apply Apply complete! Resources: 0 added, 0 changed, 0 destroyed. It only means the code is valid, not that it can really authenticate with AWS (try with a bad pair of keys). For this, we'll need to create a resource on AWS. You now have a new file named terraform.tfstate that has been created at the root of your repository. This file is critical: it's the stored state of your infrastructure. Don't hesitate to look at it, it's a text file. How it works… This first encounter with HashiCorp Configuration Language (HCL), the language used by Terraform, looks pretty familiar: we've declared variables with an optional description for reference. We could have declared them simply with the following: variable "aws_access_key" { } All variables are referenced to use the following structure: ${var.variable_name} If the variable has been declared with a default, as our aws_region has been declared with a default of eu-west-1, this value will be used if there's no override in the terraform.tfvars file. What would have happened if we didn't provide a safe default for our variable? Terraform would have asked us for a value when executed: $ terraform apply var.aws_region AWS Region Enter a value: There's more… We've used values directly inside the Terraform code to configure our AWS credentials. If you're already using AWS on the command line, chances are you already have a set of standard environment variables: $ echo ${AWS_ACCESS_KEY_ID} <your AWS_ACCESS_KEY_ID> $ echo ${AWS_SECRET_ACCESS_KEY} <your AWS_SECRET_ACCESS_KEY> $ echo ${AWS_DEFAULT_REGION} eu-west-1 If not, you can simply set them as follows: $ export AWS_ACCESS_KEY_ID="123" $ export AWS_SECRET_ACCESS_KEY="456" $ export AWS_DEFAULT_REGION="eu-west-1" Then Terraform can use them directly, and the only code you have to type would be to declare your provider! That's handy when working with different tools. The provider.tffile will then look as simple as this: provider "aws" { } Creating and using an SSH key pair to use on AWS Now we have our AWS provider configured in Terraform, let's add a SSH key pair to use on a default account of the virtual machines we intend to launch soon. Getting ready To step through this section, you will need the following: A working Terraform installation An AWS provider configured in Terraform Generate a pair of SSH keys somewhere you remember. An example can be under the keys folder at the root of your repo: $ mkdir keys $ ssh-keygen -q -f keys/aws_terraform -C aws_terraform_ssh_key -N '' An Internet connection How to do it… The resource we want for this is named aws_key_pair. Let's use it inside a keys.tf file, and paste the public key content: resource "aws_key_pair""admin_key" { key_name = "admin_key" public_key = "ssh-rsa AAAAB3[…]" } This will simply upload your public key to your AWS account under the name admin_key: $ terraform apply aws_key_pair.admin_key: Creating... fingerprint: "" =>"<computed>" key_name: "" =>"admin_key" public_key: "" =>"ssh-rsa AAAAB3[…]" aws_key_pair.admin_key: Creation complete Apply complete! Resources: 1 added, 0 changed, 0 destroyed. If you manually navigate to your AWS account, under EC2 |Network & Security | Key Pairs, you'll now find your key: Another way to use our key with Terraform and AWS would be to read it directly from the file, and that would show us how to use file interpolation with Terraform. To do this, let's declare a new empty variable to store our public key in variables.tf: variable "aws_ssh_admin_key_file" { } Initialize the variable to the path of the key in terraform.tfvars: aws_ssh_admin_key_file = "keys/aws_terraform" Now let's use it in place of our previous keys.tf code, using the file() interpolation: resource "aws_key_pair""admin_key" { key_name = "admin_key" public_key = "${file("${var.aws_ssh_admin_key_file}.pub")}" } This is a much clearer and more concise way of accessing the content of the public key from the Terraform resource. It's also easier to maintain, as changing the key will only require to replace the file and nothing more. How it works… Our first resource, aws_key_pair takes two arguments (a key name and the public key content). That's how all resources in Terraform work. We used our first file interpolation, using a variable, to show how to use a more dynamic code for our infrastructure. There's more… Using Ansible, we can create a role to do the same job. Here's how we can manage our EC2 key pair using a variable, under the name admin_key. For simplification, we're using here the three usual environment variables—AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, and AWS_DEFAULT_REGION: Here's a typical Ansible file hierarchy: ├── keys │ ├── aws_terraform │ └── aws_terraform.pub ├── main.yml └── roles └── ec2_keys └── tasks └── main.yml In the main file (main.yml), let's declare that our host (localhost) will apply the role dedicated to manage our keys: --- - hosts: localhost roles: - ec2_keys In the ec2_keys main task file, create the EC2 key (roles/ec2_keys/tasks/main.yml): --- - name: ec2 admin key ec2_key: name: admin_key key_material: "{{ item }}" with_file: './keys/aws_terraform.pub' Execute the code with the following command: $ ansible-playbook -i localhost main.yml TASK [ec2_keys : ec2 admin key] ************************************************ ok: [localhost] => (item=ssh-rsa AAAA[…] aws_terraform_ssh) PLAY RECAP ********************************************************************* localhost : ok=2 changed=0 unreachable=0 failed=0 Using AWS security groups with Terraform Amazon's security groups are similar to traditional firewalls, with ingress and egress rules applied to EC2 instances. These rules can be updated on-demand. We'll create an initial security group allowing ingress Secure Shell (SSH) traffic only for our own IP address, while allowing all outgoing traffic. Getting ready To step through this section, you will need the following: A working Terraform installation An AWS provider configured in Terraform An Internet connection How to do it… The resource we're using is called aws_security_group. Here's the basic structure: resource "aws_security_group""base_security_group" { name = "base_security_group" description = "Base Security Group" ingress { } egress { } } We know we want to allow inbound TCP/22 for SSH only for our own IP (replace 1.2.3.4/32 by yours!), and allow everything outbound. Here's how it looks: ingress { from_port = 22 to_port = 22 protocol = "tcp" cidr_blocks = ["1.2.3.4/32"] } egress { from_port = 0 to_port = 0 protocol = "-1" cidr_blocks = ["0.0.0.0/0"] } You can add a Name tag for easier reference later: tags { Name = "base_security_group" } Apply this and you're good to go: $ terraform apply aws_security_group.base_security_group: Creating... […] aws_security_group.base_security_group: Creation complete Apply complete! Resources: 1 added, 0 changed, 0 destroyed. You can see your newly created security group by logging into the AWS Console and navigating to EC2 Dashboard|Network & Security|Security Groups: Another way of accessing the same AWS console information is through the AWS command line: $ aws ec2 describe-security-groups --group-names base_security_group {...} There's more… We can achieve the same result using Ansible. Here's the equivalent of what we just did with Terraform in this section: --- - name: base security group ec2_group: name: base_security_group description: Base Security Group rules: - proto: tcp from_port: 22 to_port: 22 cidr_ip: 1.2.3.4/32 Summary In this article, you learnedhow to configure the Terraform AWS provider, create and use an SSH key pair to use on AWS, and use AWS security groups with Terraform. Resources for Article: Further resources on this subject: Deploying Highly Available OpenStack [article] Introduction to Microsoft Azure Cloud Services [article] Concepts for OpenStack [article]

0
0
2313

article-image-software-defined-data-center

Packt

14 Nov 2016

33 min read

The Software-defined Data Center

Packt

14 Nov 2016

33 min read

In this article by Valentin Hamburger, author of the book Building VMware Software-Defined Data Centers, we are introduced and briefed about the software-defined data center (SDDC) that has been introduced by VMware, to further describe the move to a cloud like IT experience. The term software-defined is the important bit of information. It basically means that every key function in the data center is performed and controlled by software, instead of hardware. This opens a whole new way of operating, maintaining but also innovating in a modern data center. (For more resources related to this topic, see here.) But how does a so called SDDC look like – and why is a whole industry pushing so hard towards its adoption? This question might also be a reason why you are reading this article, which is meant to provide a deeper understanding of it and give practical examples and hints how to build and run such a data center. Meanwhile it will also provide the knowledge of mapping business challenges with IT solutions. This is a practice which becomes more and more important these days. IT has come a long way from a pure back office, task oriented role in the early days, to a business relevant asset, which can help organizations to compete with their competition. There has been a major shift from a pure infrastructure provider role to a business enablement function. Today, most organizations business is just as good as their internal IT agility and ability to innovate. There are many examples in various markets where a whole business branch was built on IT innovations such as Netflix, Amazon Web Services, Uber, Airbnb – just to name a few. However, it is unfair to compare any startup with a traditional organization. A startup has one application to maintain and they have to build up a customer base. A traditional organization has a proven and wide customer base and many applications to maintain. So they need to adapt their internal IT to become a digital enterprise, with all the flexibility and agility of a startup, but also maintaining the trust and control over their legacy services. This article will cover the following points: Why is there a demand for SDDC in IT What is SDDC Understand the business challenges and map it to SDDC deliverables The relation of a SDDC and an internal private cloud Identify new data center opportunities and possibilities Become a center of innovation to empower your organizations business The demand for change Today organizations face different challenges in the market to stay relevant. The biggest move was clearly introduced by smartphones and tablets. It was not just a computer in a smaller device, they changed the way IT is delivered and consumed by end users. These devices proved that it can be simple to consume and install applications. Just search in an app store – choose what you like – use it as long as you like it. If you do not need it any longer, simply remove it. All with very simplistic commands and easy to use gestures. More and more people relying on IT services by using a smartphone as their terminal to almost everything. These devices created a demand for fast and easy application and service delivery. So in a way, smartphones have not only transformed the whole mobile market, they also transformed how modern applications and services are delivered from organizations to their customers. Although it would be quite unfair to compare a large enterprise data center with an app store or enterprise service delivery with any app installs on a mobile device, there are startups and industries which rely solely on the smartphone as their target for services, such as Uber or WhatsApp. On the other side, smartphone apps also introduce a whole new way of delivering IT services, since any company never knows how many people will use the app simultaneously. But in the backend they still have to use web servers and databases to continuously provide content and data for these apps. This also introduces a new value model for all other companies. People start to judge a company by the quality of their smartphone apps available. Also people started to migrate to companies which might offer a better smartphone integration as the previous one used. This is not bound to a single industry, but affects a broad spectrum of industries today such as the financial industry, car manufacturers, insurance groups, and even food retailers, just to name a few. A classic data center structure might not be ideal for quick and seamless service delivery. These architectures are created by projects to serve a particular use case for a couple of years. An example of this bigger application environments are web server farms, traditional SAP environments, or a data warehouse. Traditionally these were designed with an assumption about their growth and use. Special project teams have set them up across the data center pillars, as shown in the following figure. Typically, those project teams separate after such the application environment has been completed. All these pillars in the data center are required to work together, but every one of them also needs to mind their own business. Mostly those different divisions also have their own processes which than may integrate in a data center wide process. There was a good reason to structure a data center in this way, the simple fact that nobody can be an expert for every discipline. Companies started to create groups to operate certain areas in a data center, each building their own expertise for their own subject. This was evolving and became the most applied model for IT operations within organizations. Many, if not all, bigger organizations have adopted this approach and people build their careers on this definitions. It served IT well for decades and ensured that each party was adding its best knowledge to any given project. However, this setup has one flaw, it has not been designed for massive change and scale. The bigger these divisions get, the slower they can react to request from other groups in the data center. This introduces a bi-directional issue – since all groups may grow in a similar rate, the overall service delivery time might also increase exponentially. Unfortunately, this also introduces a cost factor when it comes to service deployments across these pillars. Each new service, an organization might introduce or develop, will require each area of IT to contribute. Traditionally, this is done by human hand overs from one department to the other. Each of these hand overs will delay the overall project time or service delivery time, which is also often referred to as time to market. It reflects the needed time interval from the request of a new service to its actual delivery. It is important to mention that this is a level of complexity every modern organization has to deal with, when it comes to application deployment today. The difference between organizations might be in the size of the separate units, but the principle is always the same. Most organizations try to bring their overall service delivery time down to be quicker and more agile. This is often related to business reasons as well as IT cost reasons. In some organizations the time to deliver a brand new service from request to final roll out may take 90 working days. This means – a requestor might wait 18 weeks or more than four and a half month from requesting a new business service to its actual delivery. Do not forget that this reflects the complete service delivery – over all groups until it is ready for production. Also, after these 90 days the requirement of the original request might have changed which would lead into repeating the entire process. Often a quicker time to market is driven by the lines of business (LOB) owners to respond to a competitor in the market, who might already deliver their services faster. This means that today's IT has changed from a pure internal service provider to a business enabler supporting its organization to fight the competition with advanced and innovative services. While this introduces a great chance to the IT department to enable and support their organizations business, it also introduces a threat at the same time. If the internal IT struggles to deliver what the business is asking for, it may lead to leverage shadow IT within the organization. The term shadow IT describes a situation where either the LOBs of an organization or its application developers have grown so disappointed with the internal IT delivery times, that they actually use an external provider for their requirements. This behavior is not agreed with the IT security and can lead to heavy business or legal troubles. This happens more often than one might expect, and it can be as simple as putting some internal files on a public cloud storage provider. These services grant quick results. It is as simple as Register – Download – Use. They are very quick in enrolling new users and sometimes provide a limited use for free. The developer or business owner might not even be aware that there is something non-compliant going on while using this services. So besides the business demand for a quicker service delivery and the security aspect, there an organizations IT department has now also the pressure of staying relevant. But SDDC can provide much more value to the IT than just staying relevant. The automated data center will be an enabler for innovation and trust and introduce a new era of IT delivery. It can not only provide faster service delivery to the business, it can also enable new services or offerings to help the whole organization being innovative for their customers or partners. Business challenges—the use case Today's business strategies often involve a digital delivery of services of any kind. This implies that the requirements a modern organization has towards their internal IT have changed drastically. Unfortunately, the business owners and the IT department tend to have communication issues in some organizations. Sometimes they even operate completely disconnected from each other, as if each of them where their own small company within the organization. Nevertheless, a lot of data center automation projects are driven by enhanced business requirements. In some of these cases, the IT department has not been made aware of what these business requirements look like, or even what the actual business challenges are. Sometimes IT just gets as little information as: We are doing cloud now. This is a dangerous simplification since, the use case is key when it comes to designing and identifying the right solution to solve the organizations challenges. It is important to get the requirements from both sides, the IT delivery side as well as the business requirements and expectations. Here is a simple example how a use case might be identified and mapped to technical implementation. The business view John works as a business owner in an insurance company. He recognizes that their biggest competitor in the market started to offer a mobile application to their clients. The app is simple and allows to do online contract management and tells the clients which products the have enrolled as well as rich information about contract timelines and possible consolidation options. He asks his manager to start a project to also deliver such an application to their customers. Since it is only a simple smartphone application, he expects that it's development might take a couple of weeks and than they can start a beta phase. To be competitive he estimates that they should have something useable for their customers within maximum of 5 months. Based on these facts, he got approval from his manager to request such a product from the internal IT. The IT view Tom is the data center manager of this insurance company. He got informed that the business wants to have a smartphone application to do all kinds of things for the new and existing customers. He is responsible to create a project and bring all necessary people on board to support this project and finally deliver the service to the business. The programming of the app will be done by an external consulting company. Tom discusses a couple of questions regarding this request with his team: How many users do we need to serve? How much time do we need to create this environment? What is the expected level of availability? How much compute power/disk space might be required? After a round of brainstorming and intense discussion, the team still is quite unsure how to answer these questions. For every question there is a couple of variables the team cannot predict. Will only a few of their thousands of users adopt to the app, what if they undersize the middleware environment? What if the user adoption rises within a couple of days, what if it lowers and the environment is over powered and therefor the cost is too high? Tom and his team identified that they need a dynamic solution to be able to serve the business request. He creates a mapping to match possible technical capabilities to the use case. After this mapping was completed, he is using it to discuss with his CIO if and how it can be implemented. Business challenge Question IT capability Easy to use app to win new customers/keep existing How many users do we need to server? Dynamic scale of an environment based on actual performance demand. How much time do we need to create this environment? To fulfill the expectations the environment needs to be flexible. Start small – scale big. What is the expected level of availability? Analytics and monitoring over all layers. Including possible self healing approach. How much compute power/disk space might be required? Create compute nodes based on actual performance requirements on demand. Introduce a capacity on demand model for required resources. Given this table, Tom revealed that with their current data center structure it is quite difficult to deliver what the business is asking for. Also, he got a couple of requirements from other departments, which are going in a similar direction. Based on these mappings, he identified that they need to change their way of deploying services and applications. They will need to use a fair amount of automation. Also, they have to span these functionalities across each data center department as a holistic approach, as shown in the following diagram: In this example, Tom actually identified a very strong use case for SDDC in his company. Based on the actual business requirements of a "simple" application, the whole IT delivery of this company needs to adopt. While this may sound like pure fiction, these are the challenges modern organizations need to face today. It is very important to identify the required capabilities for the entire data center and not just for a single department. You will also have to serve the legacy applications and bring them onto the new model. Therefore it is important to find a solution, which is serving the new business case as well as the legacy applications either way. In the first stage of any SDDC introduction in an organization, it is key to keep always an eye on the big picture. Tools to enable SDDC There is a basic and broadly accepted declaration of what a SDDC needs to offer. It can be considered as the second evolutionary step after server virtualization. It offers an abstraction layer from the infrastructure components such as compute, storage, and network by using automation and tools as such as a self service catalog In a way, it represents a virtualization of the whole data center with the purpose to simplify the request and deployment of complex services. Other capabilities of an SDDC are: Automated infrastructure/service consumption Policy based services and applications deployment Changes to services can be made easily and instantly All infrastructure layers are automated (storage, network, and compute) No human intervention is needed for infrastructure/service deployment High level of standardization is used Business logic is for chargeback or show back functionality All of the preceding points define a SDDC technically. But it is important to understand that a SDDC is considered to solve the business challenges of the organization running it. That means based on the actual business requirements, each SDDC will serve a different use case. Of course there is a main setup you can adopt and roll out – but it is important to understand your organizations business challenges in order to prevent any planning or design shortcomings. Also, to realize this functionality, SDDC needs a couple of software tools. These are designed to work together to deliver a seamless environment. The different parts can be seen like gears in a watch where each gear has an equally important role to make the clockwork function correctly. It is important to remember this when building your SDDC, since missing on one part can make another very complex or even impossible afterwards. This is a list of VMware tools building a SDDC: vRealize Business for Cloud vRealize Operations Manager vRealize Log Insight vRealize Automation vRealize Orchestrator vRealize Automation Converged Blueprint vRealize Code Stream VMware NSX VMware vSphere vRealize Business for Cloud is a charge back/show back tool. It can be used to track cost of services as well as the cost of a whole data center. Since the agility of a SDDC is much higher than for a traditional data center, it is important to track and show also the cost of adding new services. It is not only important from a financial perspective, it also serves as a control mechanism to ensure users are not deploying uncontrolled services and leaving them running even if they are not required anymore. vRealize Operations Manager is serving basically two functionalities. One is to help with the troubleshooting and analytics of the whole SDDC platform. It has an analytics engine, which applies machine learning to the behavior of its monitored components. The other important function is capacity management. It is capable of providing what-if analysis and informs about possible shortcomings of resources way before they occur. These functionalities also use the machine learning algorithms and get more accurate over time. This becomes very important in an dynamic environment where on-demand provisioning is granted. vRealize Log Insight is a unified log management. It offers rich functionality and can search and profile a large amount of log files in seconds. It is recommended to use it as a universal log endpoint for all components in your SDDC. This includes all OSes as well as applications and also your underlying hardware. In an event of error, it is much simpler to have a central log management which is easy searchable and delivers an outcome in seconds. vRealize Automation (vRA) is the base automation tool. It is providing the cloud portal to interact with your SDDC. The portal it provides offers the business logic such as service catalogs, service requests, approvals, and application life cycles. However, it relies strongly on vRealize Orchestrator for its technical automation part. vRA can also tap into external clouds to extend the internal data center. Extending a SDDC is mostly referred to as hybrid cloud. There are a couple of supported cloud offerings vRA can manage. vRealize Orchestrator (vRO) is providing the workflow engine and the technical automation part of the SDDC. It is literally the orchestrator of your new data center. vRO can be easily bound together with vRA to form a very powerful automation suite, where anything with an application programming interface (API) can be integrated. Also it is required to integrate third-party solutions into your deployment workflows, such as configuration management database (CMDB), IP address management (IPAM), or ticketing systems via IT service management (ITSM). vRealize Automation Converged Blueprint was formally known as vRealize Automation Application Services and is an add-on functionality to vRA, which takes care of application installations. It can be used with pre-existing scripts (like Windows PowerShell or Bash on Linux) – but also with variables received from vRA. This makes it very powerful when it comes to on demand application installations. This tool can also make use of vRO to provide even better capabilities for complex application installations. vRealize Code Stream is an addition to vRA and serves specific use cases in the DevOps area of the SDDC. It can be used with various development frameworks such as Jenkins. Also it can be used as a tool for developers to build and operate their own software test, QA and deployment environment. Not only can the developer build these separate stages, the migration from one stage into another can also be fully automated by scripts. This makes it a very powerful tool when it comes to stage and deploy modern and traditional applications within the SDDC. VMware NSX is the network virtualization component. Given the complexity some applications/services might introduce, NSX will provide a good and profound solution to help solving it. The challenges include: Dynamic network creation Microsegmentation Advanced security Network function virtualization VMware vSphere is mostly the base infrastructure and used as the hypervisor for server virtualization. You are probably familiar with vSphere and its functionalities. However, since the SDDC is introducing a change to you data center architecture, it is recommended to re-visit some of the vSphere functionalities and configurations. By using the full potential of vSphere it is possible to save effort when it comes to automation aspects as well as the service/application deployment part of the SDDC. This represents your toolbox required to build the platform for an automated data center. All of them will bring tremendous value and possibilities, but they also will introduce change. It is important that this change needs to be addressed and is a part of the overall SDDC design and installation effort. Embrace the change. The implementation journey While a big part of this article focuses on building and configuring the SDDC, it is important to mention that there are also non-technical aspects to consider. Creating a new way of operating and running your data center will always involve people. It is important to also briefly touch this part of the SDDC. Basically there are three major players when it comes to a fundamental change in any data center, as shown in the following image: Basically there are three major topics relevant for every successful SDDC deployment. Same as for the tools principle, these three disciplines need to work together in order to enable the change and make sure that all benefits can be fully leveraged. These three categories are: People Process Technology The process category Data center processes are as established and settled as IT itself. Beginning with the first operator tasks like changing tapes or starting procedures up to highly sophisticated processes to ensure that the service deployment and management is working as expected they have already come a long way. However, some of these processes might not be fit for purpose anymore, once automation is applied to a data center. To build a SDDC it is very important to revisit data center processes and adopt them to work with the new automation tasks. The tools will offer integration points into processes, but it is equally important to remove bottle necks for the processes as well. However, keep in mind that if you automate a bad process, the process will still be bad – but fully automated. So it is also necessary to re-visit those processes so that they can become slim and effective as well. Remember Tom, the data center manager. He has successfully identified that they need a SDDC to fulfill the business requirements and also did a use case to IT capabilities mapping. While this mapping is mainly talking about what the IT needs to deliver technically, it will also imply that the current IT processes need to adopt to this new delivery model. The process change example in Tom's organization If the compute department works on a service involving OS deployment, they need to fill out an Excel sheet with IP addresses and server names and send it to the networking department. The network admins will ensure that there is no double booking by reserving the IP address and approve the requested host name. After successfully proving the uniqueness of this data, name and IP gets added to the organizations DNS server. The manual part of this process is not longer feasible once the data center enters the automation era – imagine that every time somebody orders a service involving a VM/OS deploy, the network department gets an e-mail containing the Excel with the IP and host name combination. The whole process will have to stop until this step is manually finished. To overcome this, the process has to be changed to use an automated solution for IPAM. The new process has to track IP and host names programmatically to ensure there is no duplication within the entire data center. Also, after successfully checking the uniqueness of the data, it has to be added to the Domain Name System (DNS). While this is a simple example on one small process, normally there is a large number of processes involved which need to be re-viewed for a fully automated data center. This is a very important task and should not be underestimated since it can be a differentiator for success or failure of an SDDC. Think about all other processes in place which are used to control the deploy/enable/install mechanics in your data center. Here is a small example list of questions to ask regarding established processes: What is our current IPAM/DNS process? Do we need to consider a CMDB integration? What is our current ticketing process? (ITSM) What is our process to get resources from network, storage, and compute? What OS/VM deployment process is currently in place? What is our process to deploy an application (hand overs, steps, or departments involved)? What does our current approval process look like? Do we need a technical approval to deliver a service? Do we need a business approval to deliver a service? What integration process do we have for a service/application deployment? DNS, Active Directory (AD), Dynamic Host Configuration Protocol (DHCP), routing, Information Technology Infrastructure Library (ITIL), and so on Now for the approval question, normally these are an exception for the automation part, since approvals are meant to be manual in the first place (either technical or business). If all the other answers to this example questions involve human interaction as well, consider to change these processes to be fully automated by the SDDC. Since human intervention creates waiting times, it has to be avoided during service deployments in any automated data center. Think of it as the robotic construction bands todays car manufacturers are using. The processes they have implemented, developed over ages of experience, are all designed to stop the band only in case of an emergency. The same comes true for the SDDC – try to enable the automated deployment through your processes, stop the automation only in case of an emergency. Identifying processes is the simple part, changing them is the tricky part. However, keep in mind that this is an all new model of IT delivery, therefore there is no golden way of doing it. Once you have committed to change those processes, keep monitoring if they truly fulfill their requirement. This leads to another process principle in the SDDC: Continual Service Improvement (CSI). Re-visit what you have changed from time to time and make sure that those processes are still working as expected, if they don't, change them again. The people category Since every data center is run by people, it is important to also consider that a change of technology will also impact those people. There are some claims that a SDDC can be run with only half of the staff or save a couple of employees since all is automated. The truth is, a SDDC will transform IT roles in a data center. This means that some classic roles might vanish, while others will be added by this change. It is unrealistic to say that you can run an automated data center with half the staff than before. But it is realistic to say that your staff can concentrate on innovation and development instead of working a 100% to keep the lights on. And this is the change an automated data center introduces. It opens up the possibilities to evolve into a more architecture and design focused role for current administrators. The people example in Tom's organization Currently there are two admins in the compute department working for Tom. They are managing and maintaining the virtual environment, which is largely VMware vSphere. They are creating VMs manually, deploying an OS by a network install routine (which was a requirement for physical installs – so they kept the process) and than handing the ready VMs over to the next department to finish installing the service they are meant for. Recently they have experienced a lot of demand for VMs and each of them configures 10 to 12 VMs per day. Given this, they cannot concentrate on other aspects of their job, like improving OS deployments or the hand over process. At a first look it seems like the SDDC might replace these two employees since the tools will largely automate their work. But that is like saying a jackhammer will replace a construction worker. Actually their roles will shift to a more architectural aspect. They need to come up with a template for OS installations and an improvement how to further automate the deployment process. Also they might need to add new services/parts to the SDDC in order to fulfill the business needs continuously. So instead of creating all the VMs manually, they are now focused on designing a blueprint, able to be replicated as easy and efficient as possible. While their tasks might have changed, their workforce is still important to operate and run the SDDC. However, given that they focus on design and architectural tasks now, they also have the time to introduce innovative functions and additions to the data center. Keep in mind that an automated data center affects all departments in an IT organization. This means that also the tasks of the network and storage as well as application and database teams will change. In fact, in a SDDC it is quite impossible to still operate the departments disconnected from each other since a deployment will affect all of them. This also implies that all of these departments will have admins shifting to higher-level functions in order to make the automation possible. In the industry, this shift is also often referred to as Operational Transformation. This basically means that not only the tools have to be in place, you also have to change the way how the staff operates the data center. In most cases organizations decide to form a so-called center of excellence (CoE) to administer and operate the automated data center. This virtual group of admins in a data center is very similar to project groups in traditional data centers. The difference is that these people should be permanently assigned to the CoE for a SDDC. Typically you might have one champion from each department taking part in this virtual team. Each person acts as an expert and ambassador for their department. With this principle, it can be ensured that decisions and overlapping processes are well defined and ready to function across the departments. Also, as an ambassador, each participant should advertise the new functionalities within their department and enable their colleagues to fully support the new data center approach. It is important to have good expertise in terms of technology as well as good communication skills for each member of the CoE. The technology category This is the third aspect of the triangle to successfully implement a SDDC in your environment. Often this is the part where people spend most of their attention, sometimes by ignoring one of the other two parts. However, it is important to note that all three topics need to be equally considered. Think of it like a three legged chair, if one leg is missing it can never stand. The term technology does not necessarily only refer to new tools required to deploy services. It also refers to already established technology, which has to be integrated with the automation toolset (often referred to as third-party integration). This might be your AD, DHCP server, e-mail system, and so on. There might be technology which is not enabling or empowering the data center automation, so instead of only thinking about adding tools, there might also be tools to be removed or replaced. This is a normal IT lifecycle task and has been gone through many iterations already. Think of things like a fax machine or the telex – you might not use them anymore, they have been replaced by e-mail and messaging. The technology example in Tom's organization The team uses some tools to make their daily work easier when it comes to new service deployments. One of the tools is a little graphical user interface to quickly add content to AD. The admins use it to insert the host name, Organizational Unit as well as creating the computer account with it. This was meant to save admin time, since they don't have to open all the various menus in the AD configuration to accomplish these tasks. With the automated service delivery, this has to be done programmatically. Once a new OS is deployed it has to be added to the AD including all requirements by the deployment tool. Since AD offers an API this can be easily automated and integrated into the deployment automation. Instead of painfully integrating the graphical tool, this is now done directly by interfacing the organizations AD, ultimately replacing the old graphical tool. The automated deployment of a service across the entire data center requires a fair amount of communication. Not in a traditional way, but machine-to-machine communication leveraging programmable interfaces. Using such APIs is another important aspect of the applied data center technologies. Most of today's data center tools, from backup all the way up to web servers, do come with APIs. The better the API is documented, the easier the integration into the automation tool. In some cases you might need the vendors to support you with the integration of their tools. If you have identified a tool in the data center, which does not offer any API or even command-line interface (CLI) option at all, try to find a way around this software or even consider replacing it with a new tool. APIs are the equivalent of hand overs in the manual world. The better the communication works between tools, the faster and easier the deployment will be completed. To coordinate and control all this communication, you will need far more than scripts to run. This is a task for an orchestrator, which can run all necessary integration workflows from a central point. This orchestrator will act like a conductor for a big orchestra. It will form the backbone of your SDDC. Why are these three topics so important? The technology aspect closes the triangle and brings the people and the processes parts together. If the processes are not altered to fit the new deployment methods – automation will be painful and complex to implement. If the deployment stops at some point, since the processes require manual intervention, the people will have to fill in this gap. This means that they now have new roles, but also need to maintain some of their old tasks to keep the process running. By introducing such an unbalanced implementation of an automated data center, the workload for people can actually increase, while the service delivery times may not dramatically decrease. This may lead to an avoidance of the automated tasks, since the manual intervention might seen as faster by individual admins. So it is very important to accept all three aspects as the main part of the SDDC implementation journey. They all need to be addressed equally and thoughtfully to unveil the benefits and improvements an automated data center has to offer. However, keep in mind that this truly is a journey. A SDDC is not implemented in days but in month. Given this, also the implementation team in the data center has this time to adopt themselves and their process to this new way of delivering IT services. Also all necessary departments and their lead needs to be involved in this procedure. A SDDC implementation is always a team effort. Additional possibilities and opportunities All the previews mentioned topics serve the sole goal to install and use the SDDC within your data center. However, once you have the SDDC running the real fun begins since you can start to introduce additional functionalities impossible for any traditional data center. Lets just briefly touch on some of the possibilities from an IT view. The self-healing data center This is a concept where the automatic deployment of services is connected to a monitoring system. Once the monitoring system detects that a service or environment may be facing constraints, it can automatically trigger an additional deployment for this service to increase the throughput. While this is application dependent, for infrastructure services this can become quite handy. Think of ESXi host auto deployments if compute power is becoming a constraint, or data store deployments if disk space is running low. If this automation is acting to aggressive for your organization, it can be used with an approval function. Once the monitoring detects a shortcoming it will ask for approval to fix it with a deployment action. Instead of getting an e-mail from your monitoring system that there is a constraint identified, you get an e-mail with the constraint and the resolving action. All you need to do is to approve the action. The self-scaling data center A similar principle is to use a capacity management tool to predict the growth of your environment. If it approaches a trigger, the system can automatically generate an order letter, containing all needed components to satisfy the growing capacity demands. This can than be sent to finance or the purchasing management for approval and before you even get into any capacity constraints, the new gear might be available and ready to run. However, consider the regular turnaround time for ordering hardware, which might affect how far in the future you have to set the trigger for such functionality. Both of this opportunities are more than just nice to haves, they enable your data center to be truly flexible and proactive. Due to the fact that a SDDC is offering a high amount of agility, it will also need some self-monitoring to stay flexible and useable and to fulfill unpredicted demand. Summary In this article we discussed the main principles and declarations of an SDDC. It provided an overview of the opportunities and possibilities this new data center architecture provides. Also, it covered the changes which will be introduced by this new approach. Finally it discussed the implementation journey and its involvement with people, processes and technology. Resources for Article: Further resources on this subject: VM, It Is Not What You Think! [article] Introducing vSphere vMotion [article] Creating a VM using VirtualBox - Ubuntu Linux [article]

0
0
1679

Packt

09 Aug 2016

26 min read

RDO Installation

Packt

09 Aug 2016

26 min read

In this article by Dan Radez, author of OpenStack Essentials - Second Edition, we will see how OpenStack has a very modular design, and because of this design, there are lots of moving parts. It is overwhelming to start walking through installing and using OpenStack without understanding the internal architecture of the components that make up OpenStack. In this article, we'll look at these components. Each component in OpenStack manages a different resource that can be virtualized for the end user. Separating the management of each of the types of resources that can be virtualized into separate components makes the OpenStack architecture very modular. If a particular service or resource provided by a component is not required, then the component is optional to an OpenStack deployment. Once the components that make up OpenStack have been covered, we will discuss the configuration of a community-supported distribution of OpenStack called RDO. (For more resources related to this topic, see here.) OpenStack architecture Let's start by outlining some simple categories to group these services into. Logically, the components of OpenStack are divided into three groups: Control Network Compute The control tier runs the Application Programming Interface (API) services, web interface, database, and message bus. The network tier runs network service agents for networking, and the compute tier is the virtualization hypervisor. It has services and agents to handle virtual machines. All of the components use a database and/or a message bus. The database can be MySQL, MariaDB, or PostgreSQL. The most popular message buses are RabbitMQ, Qpid, and ActiveMQ. For smaller deployments, the database and messaging services usually run on the control node, but they could have their own nodes if required. In a simple multi-node deployment, the control and networking services are installed on one server and the compute services are installed onto another server. OpenStack could be installed on one node or more than two nodes, but a good baseline for being able to scale out later is to put control and network together and compute by itself Now that a base logical architecture of OpenStack has been defined, let's look at what components make up this basic architecture. To do that, we'll first touch on the web interface and then work toward collecting the resources necessary to launch an instance. Finally, we will look at what components are available to add resources to a launched instance. Dashboard The OpenStack dashboard is the web interface component provided with OpenStack. You'll sometimes hear the terms dashboard and Horizon used interchangeably. Technically, they are not the same thing. This article will refer to the web interface as the dashboard. The team that develops the web interface maintains both the dashboard interface and the Horizon framework that the dashboard uses. More important than getting these terms right is understanding the commitment that the team that maintains this code base has made to the OpenStack project. They have pledged to include support for all the officially accepted components that are included in OpenStack. Visit the OpenStack website (http://www.openstack.org/) to get an official list of OpenStack components. The dashboard cannot do anything that the API cannot do. All the actions that are taken through the dashboard result in calls to the API to complete the task requested by the end user. Throughout this article, we will examine how to use the web interface and the API clients to execute tasks in an OpenStack cluster. Next, we will discuss both the dashboard and the underlying components that the dashboard makes calls to when creating OpenStack resources. Keystone Keystone is the identity management component. The first thing that needs to happen while connecting to an OpenStack deployment is authentication. In its most basic installation, Keystone will manage tenants, users, and roles and be a catalog of services and endpoints for all the components in the running cluster. Everything in OpenStack must exist in a tenant. A tenant is simply a grouping of objects. Users, instances, and networks are examples of objects. They cannot exist outside of a tenant. Another name for a tenant is a project. On the command line, the term tenant is used. In the web interface, the term project is used. Users must be granted a role in a tenant. It's important to understand this relationship between the user and a tenant via a role. For now, understand that a user cannot log in to the cluster unless they are a member of a tenant. Even the administrator has a tenant. Even the users the OpenStack components use to communicate with each other have to be members of a tenant to be able to authenticate. Keystone also keeps a catalog of services and endpoints of each of the OpenStack components in the cluster. This is advantageous because all of the components have different API endpoints. By registering them all with Keystone, an end user only needs to know the address of the Keystone server to interact with the cluster. When a call is made to connect to a component other than Keystone, the call will first have to be authenticated, so Keystone will be contacted regardless. Within the communication to Keystone, the client also asks Keystone for the address of the component the user intended to connect to. This makes managing the endpoints easier. If all the endpoints were distributed to the end users, then it would be a complex process to distribute a change in one of the endpoints to all of the end users. By keeping the catalog of services and endpoints in Keystone, a change is easily distributed to end users as new requests are made to connect to the components. By default, Keystone uses username/password authentication to request a token and the acquired tokens for subsequent requests. All the components in the cluster can use the token to verify the user and the user's access. Keystone can also be integrated into other common authentication systems instead of relying on the username and password authentication provided by Keystone Glance Glance is the image management component. Once we're authenticated, there are a few resources that need to be available for an instance to launch. The first resource we'll look at is the disk image to launch from. Before a server is useful, it needs to have an operating system installed on it. This is a boilerplate task that cloud computing has streamlined by creating a registry of pre-installed disk images to boot from. Glance serves as this registry within an OpenStack deployment. In preparation for an instance to launch, a copy of a selected Glance image is first cached to the compute node where the instance is being launched. Then, a copy is made to the ephemeral disk location of the new instance. Subsequent instances launched on the same compute node using the same disk image will use the cached copy of the Glance image. The images stored in Glance are sometimes called sealed-disk images. These images are disk images that have had the operating system installed but have had things such as the Secure Shell (SSH) host key and network device MAC addresses removed. This makes the disk images generic, so they can be reused and launched repeatedly without the running copies conflicting with each other. To do this, the host-specific information is provided or generated at boot. The provided information is passed in through a post-boot configuration facility called cloud-init. Usually, these images are downloaded from distribution's download pages. If you search the internet for your favorite distribution's name and cloud image, you will probably get a link to where to download a generic pre-built copy of a Glance image, also known as a cloud image. The images can also be customized for special purposes beyond a base operating system installation. If there was a specific purpose for which an instance would be launched many times, then some of the repetitive configuration tasks could be performed ahead of time and built into the disk image. For example, if a disk image was intended to be used to build a cluster of web servers, it would make sense to install a web server package on the disk image before it was used to launch an instance. It would save time and bandwidth to do it once before it is registered with Glance instead of doing this package installation and configuration over and over each time a web server instance is booted. There are quite a few ways to build these disk images. The simplest way is to do a virtual machine installation manually, make sure that the host-specific information is removed, and include cloud-init in the built image. Cloud-init is packaged in most major distributions; you should be able to simply add it to a package list. There are also tools to make this happen in a more autonomous fashion. Some of the more popular tools are virt-install, Oz, and appliance-creator. The most important thing about building a cloud image for OpenStack is to make sure that cloud-init is installed. Cloud-init is a script that should run post boot to connect back to the metadata service. Neutron Neutron is the network management component. With Keystone, we're authenticated, and from Glance, a disk image will be provided. The next resource required for launch is a virtual network. Neutron is an API frontend (and a set of agents) that manages the Software Defined Networking (SDN) infrastructure for you. When an OpenStack deployment is using Neutron, it means that each of your tenants can create virtual isolated networks. Each of these isolated networks can be connected to virtual routers to create routes between the virtual networks. A virtual router can have an external gateway connected to it, and external access can be given to each instance by associating a floating IP on an external network with an instance. Neutron then puts all the configuration in place to route the traffic sent to the floating IP address through these virtual network resources into a launched instance. This is also called Networking as a Service (NaaS). NaaS is the capability to provide networks and network resources on demand via software. By default, the OpenStack distribution we will install uses Open vSwitch to orchestrate the underlying virtualized networking infrastructure. Open vSwitch is a virtual managed switch. As long as the nodes in your cluster have simple connectivity to each other, Open vSwitch can be the infrastructure configured to isolate the virtual networks for the tenants in OpenStack. There are also many vendor plugins that would allow you to replace Open vSwitch with a physical managed switch to handle the virtual networks. Neutron even has the capability to use multiple plugins to manage multiple network appliances. As an example, Open vSwitch and a vendor's appliance could be used in parallel to manage virtual networks in an OpenStack deployment. This is a great example of how OpenStack is built to provide flexibility and choice to its users. Networking is the most complex component of OpenStack to configure and maintain. This is because Neutron is built around core networking concepts. To successfully deploy Neutron, you need to understand these core concepts and how they interact with one another. Nova Nova is the instance management component. An authenticated user who has access to a Glance image and has created a network for an instance to live on is almost ready to tie all of this together and launch an instance. The last resources that are required are a key pair and a security group. A key pair is simply an SSH key pair. OpenStack will allow you to import your own key pair or generate one to use. When the instance is launched, the public key is placed in the authorized_keys file so that a password-less SSH connection can be made to the running instance. Before that SSH connection can be made, the security groups have to be opened to allow the connection to be made. A security group is a firewall at the cloud infrastructure layer. The OpenStack distribution we'll use will have a default security group with rules to allow instances to communicate with each other within the same security group, but rules will have to be added for Internet Control Message Protocol (ICMP), SSH, and other connections to be made from outside the security group. Once there's an image, network, key pair, and security group available, an instance can be launched. The resource's identifiers are provided to Nova, and Nova looks at what resources are being used on which hypervisors, and schedules the instance to spawn on a compute node. The compute node gets the Glance image, creates the virtual network devices, and boots the instance. During the boot, cloud-init should run and connect to the metadata service. The metadata service provides the SSH public key needed for SSH login to the instance and, if provided, any post-boot configuration that needs to happen. This could be anything from a simple shell script to an invocation of a configuration management engine. Cinder Cinder is the block storage management component. Volumes can be created and attached to instances. Then they are used on the instances as any other block device would be used. On the instance, the block device can be partitioned and a filesystem can be created and mounted. Cinder also handles snapshots. Snapshots can be taken of the block volumes or of instances. Instances can also use these snapshots as a boot source. There is an extensive collection of storage backends that can be configured as the backing store for Cinder volumes and snapshots. By default, Logical Volume Manager (LVM) is configured. GlusterFS and Ceph are two popular software-based storage solutions. There are also many plugins for hardware appliances. Swift Swift is the object storage management component. Object storage is a simple content-only storage system. Files are stored without the metadata that a block filesystem has. These are simply containers and files. The files are simply content. Swift has two layers as part of its deployment: the proxy and the storage engine. The proxy is the API layer. It's the service that the end user communicates with. The proxy is configured to talk to the storage engine on the user's behalf. By default, the storage engine is the Swift storage engine. It's able to do software-based storage distribution and replication. GlusterFS and Ceph are also popular storage backends for Swift. They have similar distribution and replication capabilities to those of Swift storage. Ceilometer Ceilometer is the telemetry component. It collects resource measurements and is able to monitor the cluster. Ceilometer was originally designed as a metering system for billing users. As it was being built, there was a realization that it would be useful for more than just billing and turned into a general-purpose telemetry system. Ceilometer meters measure the resources being used in an OpenStack deployment. When Ceilometer reads a meter, it's called a sample. These samples get recorded on a regular basis. A collection of samples is called a statistic. Telemetry statistics will give insights into how the resources of an OpenStack deployment are being used. The samples can also be used for alarms. Alarms are nothing but monitors that watch for a certain criterion to be met. Heat Heat is the orchestration component. Orchestration is the process of launching multiple instances that are intended to work together. In orchestration, there is a file, known as a template, used to define what will be launched. In this template, there can also be ordering or dependencies set up between the instances. Data that needs to be passed between the instances for configuration can also be defined in these templates. Heat is also compatible with AWS CloudFormation templates and implements additional features in addition to the AWS CloudFormation template language. To use Heat, one of these templates is written to define a set of instances that needs to be launched. When a template launches, it creates a collection of virtual resources (instances, networks, storage devices, and so on); this collection of resources is called a stack. When a stack is spawned, the ordering and dependencies, shared configuration data, and post-boot configuration are coordinated via Heat. Heat is not configuration management. It is orchestration. It is intended to coordinate launching the instances, passing configuration data, and executing simple post-boot configuration. A very common post-boot configuration task is invoking an actual configuration management engine to execute more complex post-boot configuration. OpenStack installation The list of components that have been covered is not the full list. This is just a small subset to get you started with using and understanding OpenStack. Further components that are defaults in an OpenStack installation provide many advanced capabilities that we will not be able to cover. Now that we have introduced the OpenStack components, we will illustrate how they work together as a running OpenStack installation. To illustrate an OpenStack installation, we first need to install one. Let's use the RDO Project's OpenStack distribution to do that. RDO has two installation methods; we will discuss both of them and focus on one of them throughout this article. Manual installation and configuration of OpenStack involves installing, configuring, and registering each of the components we covered in the previous part, and also multiple databases and a messaging system. It's a very involved, repetitive, error-prone, and sometimes confusing process. Fortunately, there are a few distributions that include tools to automate this installation and configuration process. One such distribution is the RDO Project distribution. RDO, as a name, doesn't officially mean anything. It is just the name of a community-supported distribution of OpenStack. The RDO Project takes the upstream OpenStack code, packages it in RPMs and provides documentation, forums, IRC channels, and other resources for the RDO community to use and support each other in running OpenStack on RPM-based systems. There are no modifications to the upstream OpenStack code in the RDO distribution. The RDO project packages the code that is in each of the upstream releases of OpenStack. This means that we'll use an open source, community-supported distribution of vanilla OpenStack for our example installation. RDO should be able to be run on any RPM-based system. We will now look at the two installation tools that are part of the RDO Project, Packstack and RDO Triple-O. We will focus on using RDO Triple-O in this article. The RDO Project recommends RDO Triple-O for installations that intend to deploy a more feature-rich environment. One example is High Availability. RDO Triple-O is able to do HA deployments and Packstack is not. There is still great value in doing an installation with Packstack. Packstack is intended to give you a very lightweight, quick way to stand up a basic OpenStack installation. Let's start by taking a quick look at Packstack so you are familiar with how quick and lightweight is it. Installing RDO using Packstack Packstack is an installation tool for OpenStack intended for demonstration and proof-of-concept deployments. Packstack uses SSH to connect to each of the nodes and invokes a puppet run (specifically, a puppet apply) on each of the nodes to install and configure OpenStack. RDO website: http://openstack.redhat.com Packstack installation: http://openstack.redhat.com/install/quickstart The RDO Project quick start gives instructions to install RDO using Packstack in three simple steps: Update the system and install the RDO release rpm as follows: sudo yum update -y sudo yum install -y http://rdo.fedorapeople.org/rdo-release.rpm Install Packstack as shown in the following command: sudo yum install -y openstack-packstack Run Packstack as shown in the following command: sudo packstack --allinone The all-in-one installation method works well to run on a virtual machine as your all-in-one OpenStack node. In reality, however, a cluster will usually use more than one node beyond a simple learning environment. Packstack is capable of doing multinode installations, though you will have to read the RDO Project documentation for Packstack on the RDO Project wiki. We will not go any deeper with Packstack than the all-in-one installation we have just walked through. Don't avoid doing an all-in-one installation; it really is as simple as the steps make it out to be, and there is value in getting an OpenStack installation up and running quickly. Installing RDO using Triple-O The Triple-O project is an OpenStack installation tool developed by the OpenStack community. A Triple-O deployment consists of two OpenStack deployments. One of the deployments is an all-in-one OpenStack installation that is used as a provisioning tool to deploy a multi-node target OpenStack deployment. This target deployment is the deployment intended for end users. Triple-O stands for OpenStack on OpenStack. OpenStack on OpenStack would be OOO, which lovingly became referred to as Triple-O. It may sound like madness to use OpenStack to deploy OpenStack, but consider that OpenStack is really good at provisioning virtual instances. Triple-O applies this strength to bare-metal deployments to deploy a target OpenStack environment. In Triple-O, the two OpenStacks are called the undercloud and the overcloud. The undercloud is a baremetal management enabled all-in-one OpenStack installation that will build for you in a very prescriptive way. Baremetal management enabled means it is intended to manage physical machines instead of virtual machines. The overcloud is the target deployment of OpenStack that is intended be exposed to end users. The undercloud will take a cluster of nodes provided to it and deploy the overcloud to them, a fully featured OpenStack deployment. In real deployments, this is done with a collection of baremetal nodes. Fortunately, for learning purposes, we can mock having a bunch of baremetal nodes by using virtual machines. Mind blown yet? Let's get started with this RDO Manager based OpenStack installation to start unraveling what all this means. There is an RDO Manager quickstart project that we will use to get going. The RDO Triple-O wiki page will be the most up-to-date place to get started with RDO Triple-O. If you have trouble with the directions in this article, please refer to the wiki. OpenSource changes rapidly and RDO Triple-O is no exception. In particular, note that the directions refer to the Mitaka release of OpenStack. The name of the release will most likely be the first thing that changes on the wiki page that will impact your future deployments with RDO Triple-O. Start by downloading the pre-built undercloud image from the RDO Project's repositories. This is something you could build yourself but it would take much more time and effort to build than it would take to download the pre-built one. As mentioned earlier, the undercloud is a pretty prescriptive all-in-one deployment which lends itself well to starting with a pre-built image. These instructions come from the readme of the triple-o quickstart github repository (https://github.com/redhat-openstack/tripleo-quickstart/): myhost# mkdir -p /usr/share/quickstart_images/ myhost# cd /usr/share/quickstart_images/ myhost# wget https://ci.centos.org/artifacts/rdo/images/mitaka/delorean/stable/undercloud.qcow2.md5 https://ci.centos.org/artifacts/rdo/images/mitaka/delorean/stable/undercloud.qcow2 Make sure that your ssh key exists: Myhost# ls ~/.ssh If you don't see the id_rsa and id_rsa.pub files in that directory list, run the command ssh-keygen. Then make sure that your public key is in the authorized keys file: myhost# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys Once you have the undercloud image and you ssh keys pull a copy of the quickstart.sh file, install the dependencies and execute the quickstart script: myhost# cd ~ myhost# wget https://raw.githubusercontent.com/redhat-openstack/tripleo-quickstart/master/quickstart.sh myhost#sh quickstart.sh -u file:///usr/share/quickstart_images/undercloud.qcow2 localhost quickstart.sh will use Ansible to set up the undercloud virtual machine and will define a few extra virtual machines that will be used to mock a collection of baremetal nodes for an overcloud deployment. To see the list of virtual machines that quickstack.sh created, use virsh to list them: myhost# virsh list --all Id Name State ---------------------------------------------------- 17 undercloud running - ceph_0 shut off - compute_0 shut off - control_0 shut off - control_1 shut off - control_2 shut off Along with the undercloud virtual machine, there are ceph, compute, and control virtual machine definitions. These are the nodes that will be used to deploy the OpenStack overcloud. Using virtual machines like this to deploy OpenStack is not suitable for anything but your own personal OpenStack enrichment. These virtual machines represent physical machines that would be used in a real deployment that would be exposed to end users. To continue the undercloud installation, connect to the undercloud virtual machine and run the undercloud configuration: myhost# ssh -F /root/.quickstart/ssh.config.ansible undercloud undercloud# openstack undercloud install The undercloud install command will set up the undercloud machine as an all-in-one OpenStack installation ready be told how to deploy the overcloud. Once the undercloud installation is completed, the final steps are to seed the undercloud with configuration about the overcloud deployment and execute the overcloud deployment: undercloud# source stackrc undercloud# openstack overcloud image upload undercloud# openstack baremetal import --json instackenv.json undercloud# openstack baremetal configure boot undercloud# neutron subnet-list undercloud# neutron subnet-update <subnet-uuid> --dns-nameserver 8.8.8.8 There are also some scripts and other automated ways to make these steps happen: look at the output of the quickstart script or Triple-O quickstart docs in the GitHub repository to get more information about how to automate some of these steps. The source command puts information into the shell environment to tell the subsequent commands how to communicate with the undercloud. The image upload command uploads disk images into Glance that will be used to provision the overcloud nodes.. The first baremetal command imports information about the overcloud environment that will be deployed. This information was written to the instackenv.json file when the undercloud virtual machine was created by quickstart.sh. The second configures the images that were just uploaded in preparation for provisioning the overcloud nodes. The two neutron commands configure a DNS server for the network that the overclouds will use, in this case Google's. Finally, execute the overcloud deploy: undercloud# openstack overcloud deploy --control-scale 1 --compute-scale 1 --templates --libvirt-type qemu --ceph-storage-scale 1 -e /usr/share/openstack-tripleo-heat-templates/environments/storage-environment.yaml Let's talk about what this command is doing. In OpenStack, there are two basic node types, control and compute. A control node runs the OpenStack API services, OpenStack scheduling service, database services, and messaging services. Pretty much everything except the hypervisors are part of the control tier and are segregated onto control nodes in a basic deployment. In an HA deployment, there are at least three control nodes. This is why you see three control nodes in the list of virtual machines quickstart.sh created. RDO Triple-O can do HA deployments, though we will focus on non-HA deployments in this article. Note that in the command you have just executed, control scale and compute scale are both set to one. This means that you are deploying one control and one compute node. The other virtual machines will not be used. Take note of the libvirt-type parameter. It is only required if the compute node itself it virtualized, which is what we are doing with RDO Triple-O, to set the configuration properly for the instances to nested. Nested virtualization is when virtual machines are running inside of a virtual machine. In this case, the instances will be virtual machines running inside of the compute node, which is a virtual machine. Finally, the ceph storage scale and storage environment file will deploy Ceph at the storage backend for Glance and Cinder. If you leave off the Ceph and storage environment file parameters, one less virtual machine will be used for deployment. The indication the overcloud deploy has succeeded will give you a Keystone endpoint and a success message: Overcloud Endpoint: http://192.0.2.6:5000/v2.0 Overcloud Deployed Connecting to your Overcloud Finally, before we dig into looking at the OpenStack components that have been installed and configured, let's identify three ways that you can connect to the freshly installed overcloud deployment: From the undercloud: This is the quickest way to access the overcloud. When the overcloud deployment completed, a file named overcloudrc was created. Install the client libraries: Both RDO Triple-O and Packstack were installed from the RDO release repository. By installing this release repository, in the same way that was demonstrated earlier for Packstack on another computer, the OpenStack client libraries can be installed on that computer. If these libraries are installed on a computer that can route the network the overcloud was installed on then the overcloud can be accessed from that computer the same as it can from the undercloud. This is helpful if you do not want to be tied to jumping through the undercloud node to access the overcloud: laptop# sudo yum install -y http://rdo.fedorapeople.org/rdo-release.rpm laptop# sudo yum install python-openstackclient In addition to the client package, you will also need the overcloudrc file from the undercloud. As an example, you can install the packages on the host machine you have just run quickstart.sh and make the overcloud routable by adding an IP address to the OVS bridge the virtual machines were attached to: myhost# sudo ip addr add 192.0.2.222/24 dev bridget myhost# sudo ip link set up dev bridget Once this is done, the commands in the subsequent parts could be run from the host machine instead of the undercloud virtual machine. The OpenStack dashboard: OpenStack's included web interface is called the dashboard. In the installation you have just completed, you can access the overcloud's dashboard by first running the two ip commands used in the second option above then connecting to the IP address indicated as the overcloud endpoint but on port 80 instead of 5000: http://192.0.2.6/. Summary After looking at the components that make up an OpenStack installation, we used RDO Triple-O as a provisioning tool. We now have OpenStack installed and running. Now that OpenStack is installed and running, let's walk through each of the components discussed to learn how to use each of them. Resources for Article: Further resources on this subject: Keystone – OpenStack Identity Service [article] Concepts for OpenStack [article] Setting up VPNaaS in OpenStack [article]

0
0
2290

article-image-deploying-docker-container-cloud-part-2

Darwin Corn

13 Jul 2016

3 min read

Deploying a Docker Container to the Cloud, Part 2

Darwin Corn

13 Jul 2016

3 min read

I previously wrote about app containerization using Docker, and if you’re unfamiliar with that concept, please read that post first. In this post, I'm going to pick up where I left off, with a fully containerized frontend ember application showcasing my music that I now want to share with the world. Speaking of that app in part 1—provided you don't have a firewall blocking port 80 inbound—if you've come straight over from the previous post, you're serving a web app to everyone on your internal network right now. You should, of course, map it to only allow 127.0.0.1 on port 80 instead of 0.0.0.0 (everyone). In this post I am going to focus on my mainstream cloud platform of choice, Google Cloud Platform (GCP). It will only cost ~$5/month, with room to house more similarly simple apps—MVPs, proofs of concept and the like. Go ahead and sign up for the free GCP trial, and create a project. Templates are useful for rapid scaling and minimizing the learning curve; but for the purpose of learning, how this actually works, and for minimizing financial impact, they're next to useless. First, you need to get the container into the private registry that comes with every GCP project. Okay, let's get started. You need to tag the image so that Google Cloud Platform knows where to put it. Then you're going to use the gcloud command-line tool to push it to that cloud registry. $ docker tag docker-demo us.gcr.io/[YOUR PROJECT ID HERE]/docker-demo $ gcloud docker push us.gcr.io/[YOUR PROJECT ID HERE]/docker-demo Congratulations, you have your first container in the cloud! Now let's deploy it. We're going to use Google's Compute Engine, not their Container Engine (besides the registry, but no cluster templates for us). Refer to this article, and if you're using your own app, you'll have to write up a container manifest. If you're using the docker-demo app from the first article, make sure to run a git pull to get an up-to-date version of the repo and notice that a containers.yaml manifest file has been added to the root of the application. containers.yaml apiVersion: v1 kind: Pod metadata: name: docker-demo spec: containers: - name: docker-demo image: us.gcr.io/[YOUR PROJECT ID HERE]/docker-demo imagePullPolicy: Always ports: - containerPort: 80 hostPort: 80 That file instructs the container-vm (purpose-built for running containers)-based VM we're about to create to pull the image and run it. Now let's run the gcloud command to create the VM in the cloud that will host the image, telling it to use the manifest. $ gcloud config set project [YOUR PROJECT ID HERE] $ gcloud compute instances create docker-demo --image container-vm --metadata-from-file google-container-manifest=containers.yaml --zone us-central1-a --machine-type f1-micro Launch the GCP Developer Console and set the firewall on your shiny new VM to 'Allow HTTP traffic'. Or run the following command. $ gcloud compute instances add-tags docker-demo --tags http-server --zone us-central1-a Either way, the previous gcloud compute instances create command should've given you the External (Public) IP of the VM, and navigating there from your browser will show the app. Congrats, you've now deployed a fully containerized web application to the cloud! If you're leaving this up, remember to reserve a static IP for your VM. I recommend consulting some of the documentation I've referenced here to monitor VM and container health as well. About the Author Darwin Corn is a systems analyst for the Consumer Direct Care Network. He is a mid-level professional with diverse experience in the information technology world.

0
0
2586

Packt

07 Jul 2016

5 min read

AIO setup of OpenStack – preparing the infrastructure code environment

Packt

07 Jul 2016

5 min read

Viewing your OpenStack infrastructure deployment as code will not only simplify node configuration, but also improve the automation process. Despite the existence of numerous system-management tools to bring our OpenStack up and running in an automated way, we have chosen Ansible for automation of our infrastructure. (For more resources related to this topic, see here.) At the end of the day you can choose to use any automation tool that fits your production need, the key point to keep in mind is that to manage a big production environment you must simplify operation by: Automating deployment and operation as much as possible Tracking your changes in a version control system Continuous integration of code to keep you infrastructure updated and bug free Monitoring and testing your infrastructure code to make it robust. We have chosen Git to be our version control system. Let's go ahead and install the Git package on our development system. Check the correctness of the Git installation: If you decide to use IDE like eclipse for you development, it might be easier to install a Git plugin to integrate Git to your IDE. For example, the EGit plugin can be used to develop with Git in Eclipse. We do this by navigating to the Help | Install new software menu entry. You will need to add the following URL to install EGit: http://download.eclipse.org/egit/updates. Preparing the development setup The install process is divided into the following steps: Checkout the OSA repository. Install and bootstrap Ansible. Initial host bootstrap. Run playbooks. Configuring your setup The AIO development environment used the configuration files in the test/roles/bootstrap-host/defaults/main.yml file. This file describes the default values for the host configuration. In addition to the configuration, file the configuration options can be passed through shell environment variables. The BOOTSTRAP_OPTS variable is read by the bootstrap script as a space separated key-value pair. It can be used to pass values to override the default ones in the configuration file: export BOOTSTRAP_OPTS="${BOOTSTRAP_OPTS} bootstrap_host_loopback_cinder_size=512" OSA also allows overriding default values for service configuration. These override values are provides in the etc/openstack_deploy/user_variables.yml file. The following is an example of overriding the values in nova.conf using the override file: nova_nova_conf_overrides: DEFAULT: remove_unused_original_minimum_age_seconds: 43200 libvirt: cpu_mode: host-model disk_cachemodes: file=directsync,block=none database: idle_timeout: 300 max_pool_size: 10 This override file will populate the nova.conf file with the following options: [DEFAULT] remove_unused_original_minimum_age_seconds = 43200 [libvirt] cpu_mode = host-model disk_cachemodes = file=directsync,block=none [database] idle_timeout = 300 max_pool_size = 10 The override variables can also be passed using a per host configuration stanza in /etc/openstack_deploy/openstack_user_config.yml. The complete set of configuration options are described in the OpenStack Ansible documentation at http://docs.openstack.org/developer/openstack-ansible/install-guide/configure-openstack.html. Building the development setup To start the installation process, execute the Ansible bootstrap script. This script will download and install the correct Ansible version. It also creates a wrapper script around ansible-playbook called openstack-ansible that always loads the OpenStack user variable files. Next step is to configure the system for the all-in-one setup. This script does the following tasks: Applies Ansible roles to install the basic software requirements like openSSH and pip. It also applies the bootstrap_host role to check the hard-disk and swap space Create various loopback volumes for use with Cinder, Swift and Nova Prepares networking Finally, we run the playbooks to bring up the AIO development environment. This script will execute the following tasks: Creates the LXC containers Applies security hardening to the host Reinitiates the network bridges Install the infrastructure services like MySQL, RabbitMQ, memcached, and more Finally it installs the various OpenStack services Running the playbooks take a long time to build the containers and start the OpenStack services. Once finished you will have all the OpenStack services running in their private container. You can use the lxc-ls command to list the service containers on the development machine. Use the lxc-attach command to connect to any container as shown here: lxc-attach --name <name_of_container> Use the name of the container from the output of lxc-ls to attach to the container. LXC commands can be used to star and stop the service containers. The AIO environment brings MySQL cluster, which needs special care to start the MySQL cluster if the development machine is rebooted. Details of operating the AIO environment are available in the OpenStack Ansible QuickStart guide at http://docs.openstack.org/developer/openstack-ansible/developer-docs/quickstart-aio.html. Tracking your changes The OSA project itself is maintains its code under version control at the OpenStack git server (http://git.openstack.org/cgit/openstack/openstack-ansible/tree/). The configuration files of OSA are stored at /etc/openstack_ansible/ on the deployment host. These files define the deployment environment and the user override variables. To make sure that you control the deployment environment it is important that the changes to these configuration files are tracked in a version control system. To make sure that you track the development environment make sure that the Vagrant configuration files are also tracked in version control system. Summary So far, we’ve deployed a basic AIO setup of OpenStack. Mastering OpenStack Second Edition will take you through the process of extending our design by clustering, defining the various infrastructure nodes, controller, and compute hosts. Resources for Article: Further resources on this subject: Concepts for OpenStack[article] Introducing OpenStack Trove[article] OpenStack Performance, Availability[article]

0
0
1579

article-image-creating-multitenant-applications-azure

Packt

21 Jun 2016

18 min read

Creating Multitenant Applications in Azure

Packt

21 Jun 2016

18 min read

This article, written by Roberto Freato and Marco Parenzan, is from the book Mastering Cloud Development using Microsoft Azure by Packt Publishing, and it teaches us how to create multitenant applications in Azure. This book guides you through the many efficient ways of mastering the cloud services and using Microsoft Azure and its services to its maximum capacity. (For more resources related to this topic, see here.) A tenant is a private space for a user or a group of users in an application. A typical way to identify a tenant is by its domain name. If multiple users share a domain name, we say that these users live inside the same tenant. If a group of users use a different reserved domain name, they live in a reserved tenant. From this, we can infer that different names are used to identify different tenants. Different domain names can imply different app instances, but we cannot say the same about deployed resources. Multitenancy is one of the funding principles of Cloud Computing. Developers need to reach economy of scale, which allows every cloud user to scale as needed without paying for overprovisioned resources or suffering for underprovisioned resources. To do this, cloud infrastructure needs to be oversized for a single user and sized for a pool of potential users that share the same group of resources during a certain period of time. Multitenancy is a pattern. Legacy on-premise applications usually tend to be a single-tenant app, shared between users because of the lack of specific DevOps tasks. Provisioning an app for every user can be a costly operation. Cloud environments invite reserving a single tenant for each user (or group of users) to enforce better security policies and to customize tenants for specific users because all DevOps tasks can be automated via management APIs. The cloud invites reserving resource instances for a tenant and deploying a group of tenants on the same resources. In general, this is a new way of handling app deployment. We will now take a look at how to develop an app in this way. Scenario CloudMakers.xyz, a cloud-based development company, decided to develop a personal accountant web application—MyAccountant. Professionals or small companies can register themselves on this app as a single customer and record all of their invoices on it. A single customer represents the tenant; different companies use different tenants. Every tenant needs its own private data to enforce data security, so we will reserve a dedicated database for a single tenant. Access to a single database is not an intensive task because invoice registration will generally occur once daily. Every tenant will have its own domain name to enforce company identity. A new tenant can be created from the company portal application, where new customers register themselves, specifying the tenant name. For sample purposes, without the objective of creating production-quality styling, we use the default ASP.NET MVC templates to style and build up apps and focus on tenant topics. Creating the tenant app A tenant app is an invoice recording application. To brand the tenant, we record tenant name in the app settings inside the web.config file: <add key="TenantName" value="{put_your_tenant_name}" /> To simplify this, we "brand" the application that stores the tenant name in the main layout file where the application name is displayed. The application content is represented by an Invoices page where we record data with a CRUD process. The entry for the Invoices page is in the Navigation bar: <ul class="nav navbar-nav"> <li>@Html.ActionLink("Home", "Index", "Home")</li> <li>@Html.ActionLink("Invoices", "Index", "Invoices")</li>  First, we need to define a model for the application in the models folder. As we need to store data in an Azure SQL database, we can use entity framework to create the model from an empty code. However, first we use the following code: public class InvoicesModel : DbContext { public InvoicesModel() : base("name=InvoicesModel") { } public virtual DbSet<Invoice> Invoices { get; set; } } As we can see, data will be accessed by a SQL database that is referenced by a connectionString in the web.config file: <add name="InvoicesModel" connectionString="data source=(LocalDb)MSSQLLocalDB;initial catalog=Tenant.Web.Models.InvoicesModel;integrated security=True;MultipleActiveResultSets=True; App=EntityFramework" providerName="System.Data.SqlClient" /></connectionStrings> This model class is just for demo purposes: public class Invoice { public int InvoiceId { get; set; } public int Number { get; set; } public DateTime Date { get; set; } public string Customer { get; set; } public decimal Amount { get; set; } public DateTime DueDate { get; set; } } After this, we try to compile the project to check whether we have not made any mistake. We can now scaffold this model into an MVC controller so that we can have a simple but working app skeleton. Creating the portal app We now need to create the portal app starting from the MVC default template. Its registration workflow is useful for the creation of our tenant registration. In particular, we utilize user registration as the tenant registration. The main information acquires the tenant name and triggers tenant deployment. We need to make two changes on the UI. First, in the RegisterViewModel defined under the Models folder, we add a TenantName property to the AccountViewModels.cs file: public class RegisterViewModel { [Required] [Display(Name = "Tenant Name")] public string TenantName { get; set; } [Required] [EmailAddress] [Display(Name = "Email")] public string Email { get; set; } // other code omitted } In the Register.cshtml view page under ViewsAccount folder, we add an input box: @using (Html.BeginForm("Register", "Account", FormMethod.Post, new { @class = "form-horizontal", role = "form" })) { @Html.AntiForgeryToken() <h4>Create a new account.</h4> <hr /> @Html.ValidationSummary("", new { @class = "text-danger" }) <div class="form-group"> @Html.LabelFor(m => m.TenantName, new { @class = "col-md-2 control-label" }) <div class="col-md-10"> @Html.TextBoxFor(m => m.TenantName, new { @class = "form-control" }) </div> </div> <div class="form-group"> @Html.LabelFor(m => m.Email, new { @class = "col-md-2 control-label" }) <div class="col-md-10"> @Html.TextBoxFor(m => m.Email, new { @class = "form- control" }) </div> </div>  } Portal application can be great to allow the tenant owner to manage its own tenant, configuring or handling subscription-related tasks to the supplier company. Deploying the portal application Before tenant deployment, we need to deploy the portal itself. MyAccountant is a complex solution made up of multiple Azure services, which needs to be deployed together. First, we need to create an Azure Resource Group to collect all the services: As we already discussed earlier, all data from different tenants, including the portal itself, need to be contained inside distinct Azure SQL databases. Every user will have their own DB as a personal service, which they don't use frequently. It can be a waste of money assigning a reserved quantity of Database Transaction Units (DTUs) to a single database. We can invest on a pool of DTUs that should be shared among all SQL database instances. We begin by creating an SQL Server service from the portal: We need to create a pool of DTUs, which are shared among databases, and configure the pricing tier, which defines the maximum resources allocation per DB: The first database that we need to manually deploy is the portal database, where users will register as tenants. From the MyAccountantPool blade, we can create a new database that will be immediately associated to the pool: From the database blade, we read the connection: We use this connection string to configure the portal app in web.config: <connectionStrings> <add name="DefaultConnection" connectionString="Server=tcp: {portal_db}.database.windows.net,1433;Data Source={portal_db}; .database.windows.net;Initial Catalog=Portal;Persist Security Info=False;User ID={your_username};Password={your_password}; Pooling=False;MultipleActiveResultSets=False;Encrypt=True; TrustServerCertificate=False;Connection Timeout=30;" providerName="System.Data.SqlClient" /> </connectionStrings> We need to create a shared resource for the Web. In this case, we need to create an App Service Plan where we'll host portal and tenants apps. The initial size is not a problem because we can decide to scale up or scale out the solution at any time (in this case, only when application is able to scale out—we don't handle this scenario here). Then, we need to create portal web app that will be associated with the service plan that we just created: The portal can be deployed from Visual Studio to the Azure subscription by right-clicking on the project root in Solution Explorer and selecting Microsoft Azure Web App from Publish. After deployment, the portal is up and running: Deploy the tenant app After tenant registration from the portal, we need to deploy tenant itself, which is made up of the following: The app itself that is considered as the artifact that has to be deployed A web app that runs the app, hosted on the already defined web app plan The Azure SQL database that contains data inside the elastic pool The connection string that connect database to the web app in the web.config file It's a complex activity because it involves many different resources and different kinds of tasks from deployment to configuration. For this purpose, we have the Azure Resource Group project in Visual Studio, where we can configure web app deployment and configuration via Azure Resource Manager templates. This project will be called Tenant.Deploy, and we choose a blank template to do this. In the azuredeploy.json file, we can type a template such as https://github.com/marcoparenzan/CreateMultitenantAppsInAzure/blob/master/Tenant.Deploy/Templates/azuredeploy.json. This template is quite complex. Remember that in the SQL connection string, the username and password should be provided inside the template. We need to reference the Tenant.Web project from the deployment project because we need to deploy tenant artifacts (the project bits). To support deployment, we need to create an Azure Storage Account back to the Azure portal: To understand how it works, we can manually run a deployment directly from Visual Studio by right-clicking on Deployment project from Solution Explorer and selecting Deploy. When we deploy a "sample" tenant, the first dialog will appear. You can connect to the Azure subscription, selecting an existing resource group or creating a new one and the template that describes the deployment composition. The template requires the following parameters from Edit Parameters window: The tenant name The artifact location and SAS token that are automatically added having selected the Azure Storage account from the previous dialog Now, via the included Deploy-AzureResourceGroup.ps1 PowerShell file, Azure resources are deployed. The artifact is copied with AzCopy.exe command to the Azure storage in the Tenant.Web container as a package.zip file and the resource manager starts allocating resources. We can see that tenant is deployed in the following screenshot: Automating the tenant deployment process Now, in order to complete our solution, we need to invoke this deployment process from the portal application during a registration process call in ASP.NET MVC controls. For the purpose of this article, we will just invoke the execution without defining a production-quality deployment process. We can use the following checklist before proceeding: We already have an Azure Resource Manager template that deploys the tenant app customized for the user Deployment is made with a PowerShell script in the Visual Studio deployment project A new registered user for our application does not have an Azure account; we, as service publisher, need to offer a dedicated Azure account with our credentials to deploy the new tenants Azure offers many different ways to interact with an Azure subscription: The classic portal (https://manage.windowsazure.com) The new portal (https://portal.azure.com) The resource portal (https://resources.azure.com) The Azure REST API (https://msdn.microsoft.com/en-us/library/azure/mt420159.aspx) The Azure .NET SDK (https://github.com/Azure/azure-sdk-for-net) and other platforms The Azure CLI open source CLI (https://github.com/Azure/azure-xplat-cli) PowerShell (https://github.com/Azure/azure-powershell) For our needs, this means integrating in our application. We can make these considerations: We need to reuse the same ARM template that we defined We can reuse PowerShell experience, but we can also use our experience as .NET, REST, or other platform developers Authentication is the real discriminator in our solution: the user is not an Azure subscription user and we don't want to make a constraint on this Interacting with Azure REST API, which is the API on which every other solution depends, requires that all invocations need to be authenticated to the Azure Active Directory of the subscription tenant. We already mentioned that the user is not a subscription-authenticated user. Therefore, we need an unattended authentication to our Azure API subscription using a dedicated user for this purpose, encapsulated into a component that is executed by the ASP.NET MVC application in a secure manner to make the tenant deployment. The only environment that offers an out-of-the box solution for our needs (so that we need to write less code) is the Azure Automation Service. Before proceeding, we create a dedicated user for this purpose. Therefore, for security reasons, we can disable a specific user at any time. You should take note of two things: Never use the credentials that you used to register Azure subscription in a production environment! For automation implementation, you need a Azure AD tenant user, so you cannot use Microsoft accounts (Live or Hotmail). To create the user, we need to go to the classic portal, as Azure Active Directory has no equivalent management UI in the new portal. We need to select the tenant directory, that is, the one in the new portal that is visible in the upper right corner. From the classic portal, go to to Azure Active Directory and select the tenant. Click on Add User and type in a new username: Then, go to Administrator Management in the Setting tab of the portal because we need to define the user as a co-administrator in the subscription that we need to use for deployment. Now, with the temporary password, we need to log in manually to https://portal.azure.com/ (open the browser in private mode) with these credentials because we need to change the password, as it is generated as "expired". We are now ready to proceed. Back in the new portal, we select a new Azure Automation account: The first thing that we need to do inside the account is create a credential asset to store the newly-created AAD credentials and use the inside PowerShell scripts to log on in Azure: We can now create a runbook, which is an automation task that can be expressed in different ways: Graphical PowerShell We choose the second one: As we can edit it directly from portal, we can write a PowerShell script for our purposes. This is an adaptation from the one that we used in a standard way in the deployment project inside Visual Studio. The difference is that it is runable inside a runbook and Azure, and it uses already deployed artifacts that are already in the Azure Storage account that we created before. Before proceeding, we need of two IDs from our subscription: The subscription ID The tenant ID These two parameters can be discovered with PowerShell because we can perform Login-AzureRmAccount. Run it through the command line and copy them from the output: The following code is not production quality (needs some optimization) but for demo purposes: param ( $WebhookData, $TenantName ) # If runbook was called from Webhook, WebhookData will not be null. if ($WebhookData -ne $null) { $Body = ConvertFrom-Json -InputObject $WebhookData.RequestBody $TenantName = $Body.TenantName } # Authenticate to Azure resources retrieving the credential asset $Credentials = Get-AutomationPSCredential -Name "myaccountant" $subscriptionId = '{your subscriptionId}' $tenantId = '{your tenantId}' Login-AzureRmAccount -Credential $Credentials -SubscriptionId $subscriptionId -TenantId $tenantId $artifactsLocation = 'https://myaccountant.blob.core.windows.net/ myaccountant-stageartifacts' $ResourceGroupName = 'MyAccountant' # generate a temporary StorageSasToken (in a SecureString form) to give ARM template the access to the templatea artifacts$StorageAccountName = 'myaccountant' $StorageContainer = 'myaccountant-stageartifacts' $StorageAccountKey = (Get-AzureRmStorageAccountKey - ResourceGroupName $ResourceGroupName -Name $StorageAccountName).Key1 $StorageAccountContext = (Get-AzureRmStorageAccount - ResourceGroupName $ResourceGroupName -Name $StorageAccountName).Context $StorageSasToken = New-AzureStorageContainerSASToken -Container $StorageContainer -Context $StorageAccountContext -Permission r -ExpiryTime (Get-Date).AddHours(4) $SecureStorageSasToken = ConvertTo-SecureString $StorageSasToken -AsPlainText -Force #prepare parameters for the template $ParameterObject = New-Object -TypeName Hashtable $ParameterObject['TenantName'] = $TenantName $ParameterObject['_artifactsLocation'] = $artifactsLocation $ParameterObject['_artifactsLocationSasToken'] = $SecureStorageSasToken $deploymentName = 'MyAccountant' + '-' + $TenantName + '-'+ ((Get-Date).ToUniversalTime()).ToString('MMdd-HHmm') $templateLocation = $artifactsLocation + '/Tenant.Deploy/Templates/azuredeploy.json' + $StorageSasToken # execute New-AzureRmResourceGroupDeployment -Name $deploymentName ` -ResourceGroupName $ResourceGroupName ` -TemplateFile $templateLocation ` @ParameterObject ` -Force -Verbose The script is executable in the Test pane, but for production purposes, it needs to be deployed with the Publish button. Now, we need to execute this runbook from outside ASP.NET MVC portal that we already created. We can use Webhooks for this purpose. Webhooks are user-defined HTTP callbacks that are usually triggered by some event. In our case, this is new tenant registration. As they use HTTP, they can be integrated into web services without adding new infrastructure. Runbooks can directly be exposed as a Webhooks that provides HTTP endpoint natively without the need to provide one by ourself. We need to remember some things: Webhooks are public with a shared secret in the URL, so it is "secure" if we don't share it As a shared secret, it expires, so we need to handle Webhook update in the service lifecycle As a shared secret if more users are needed, more Webhooks are needed, as the URL is the only way to recognize who invoked it (again, don't share Webhooks) Copy the URL at this stage as it is not possible to recover it but it needs to be deleted and generate a new one Write it directly in portal web.config app settings: <add key="DeplyNewTenantWebHook" value="https://s2events.azure- automation.net/webhooks?token={your_token}"/> We can set some default parameters if needed, then we can create it. To invoke the Webhook, we use System.Net.HttpClient to create a POST request, placing a JSON object containing TenantName in the body: var requestBody = new { TenantName = model.TenantName }; var httpClient = new HttpClient(); var responseMessage = await httpClient.PostAsync( ConfigurationManager.AppSettings ["DeplyNewTenantWebHook"], new StringContent(JsonConvert.SerializeObject (requestBody)) ); This code is used to customize the registration process in AccountController: public async Task<ActionResult> Register(RegisterViewModel model) { if (ModelState.IsValid) { var user = new ApplicationUser { UserName = model.Email, Email = model.Email }; var result = await UserManager.CreateAsync(user, model.Password); if (result.Succeeded) { await SignInManager.SignInAsync(user, isPersistent:false, rememberBrowser:false); // handle webhook invocation here return RedirectToAction("Index", "Home"); } AddErrors(result); } The responseMessage is again a JSON object that contains JobId that we can use to programmatically access the executed job. Conclusion There are a lot of things that can be done with the set of topics that we covered in this article. These are a few of them: We can write better .NET code for multitenant apps We can authenticate users on with the Azure Active Directory service We can leverage deployment tasks with Azure Service Bus messaging We can create more interaction and feedback during tenant deployment We can learn how to customize ARM templates to deploy other Azure Storage services, such as DocumentDB, Azure Storage, and Azure Search We can handle more PowerShell for the Azure Management tasks Summary Azure can change the way we write our solutions, giving us a set of new patterns and powerful services to develop with. In particular, we learned how to think about multitenant apps to ensure confidentiality to the users. We looked at deploying ASP.NET web apps in app services and providing computing resources with App Services Plans. We looked at how to deploy SQL in Azure SQL databases and computing resources with elastic pool. We declared a deployment script with Azure Resource Manager, Azure Resource Template with Visual Studio cloud deployment projects, and automated ARM PowerShell script execution with Azure Automation and runbooks. The content we looked at in the earlier section will be content for future articles. Code can be found on GitHub at https://github.com/marcoparenzan/CreateMultitenantAppsInAzure. Have fun! Resources for Article: Further resources on this subject: Introduction to Microsoft Azure Cloud Services [article] Microsoft Azure – Developing Web API for Mobile Apps [article] Security in Microsoft Azure [article]

0
0
6660

Packt

14 Mar 2016

8 min read

Watson Analytics – Predict

Packt

14 Mar 2016

8 min read

In this article by James Miller, author of the book Learning IBM Watson Analytics, we will discuss the mining insights—those previously unknown—from your data. This typically requires complex modeling using sophisticated algorithms to process the data. With Watson though, you don't have to know which statistical test to run on your data or even how any of the algorithms actually work. The method you use with Watson is so much simpler: identify/refine your data, create a prediction, and then view the results—that's it! We have already covered identifying and refining data, so let's now look at predictions and how one would create a prediction. First, think of predictions as your virtual folders for each predictive analysis effort you are working on. Here, you identify your data, specify field properties within the data, and select targets and inputs. After you create the prediction, you can view it to see the output from the analysis. The output consists of visual and text insights. (For more resources related to this topic, see here.) Creating a Watson prediction The steps for creating a Watson prediction are straightforward: Starting on the Welcome page, click on Predict, as shown in the following screenshot: Next, on the Create new prediction dialog, you select a previously uploaded dataset from the list (or upload new data) that you want Watson Analytics to analyze: On the Create a new analysis page (shown in the next screenshot) we set some attributes for our prediction by the following ways: Giving it a name by entering it the Name your workbook field. Targets are the fields you may be most interested in and want to know more about. These are the fields that are perhaps influenced by other fields in the data. When creating a new prediction, Watson defines default targets and field properties for you, which you can remove (by clicking on the Delete icon next to it), and then add your own choices (by clicking on Select target). Keep in mind that all predictions must have at least one target (and up to five). Finally, click on Create. Once you have clicked on Create, Watson will generate the prediction. The following screenshot shows a prediction generated based on a Watson sample dataset: Viewing the results of a prediction Once a Watson prediction has been generated, you can view its results. Predictor visualization bar Across the top of the prediction page is the Top Predictors Bar (shown in the following screenshot), where you can click on To select a particular predictor that is interesting to you. Main Insights On the Main Insight section of the prediction page (shown below for our example), you can examine the top insights that Watson was able to derive from the data. Details From the Main Insights section, you can access (by clicking on the top predictor found; this is shown circled below) the Details page, which gives you the ability to drill into the details for individual fields and interactions of your prediction. Customization After you view the results, you might want to customize the prediction to refine the analysis to produce additional insights. IBM Watson allows you to change the number of targets and see the effect of the change on the prediction results. In addition, Watson allows you to save your updated prediction or revert at any time to any particular version as desired. Watson Analytics Assemble The Watson Assemble feature is where you can actually organize or assemble the most interesting or otherwise important artifacts exposed while using Watson to predict or to explore your data files (as well as other items collected or otherwise set aside during previous Assemble sessions). This, in a way, is where you can do some programming to create powerful methods of conveying information to others. Watson breaks assembly into two types, Views and Dashboards, both of which are made up of visualizations (visualizations are defined as a graph, chart, plot, table, map, or any other visual representation of data). Views Views are customizable containers for dashboards (defined below) and stories (sets of views over time). Dashboards Dashboards are a specific type of view that help monitor events or activities at a glance. A little help To make it easier to assemble your views and dashboards, Watson Analytics provides you with templates that contain predefined layouts and grid lines for easy arrangement and alignment of the visualizations in a view. As we did with predictions earlier, let's take a look at how the Assemble process works. From the main or welcome page, click on the plus or Add New icon (shown in the image below) and then click on Assemble: While creating a new Assemble, you'll need to choose a data file (shown in the image below) from the list displayed on the Create new view dialog (of course, you can also upload a new file). Once you select which data file you want to use (simply by clicking on the filename), Watson shows you the Create View page, as shown in the following screenshot: Notice that the Name your view field defaults to the name of the file that you selected, and you'll want to change that. Click in the textbox provided and type an appropriate name for what you are creating: Once you have entered a name for your view, you'll need to decide whether you'd like to assemble a Dashboard or a Story. Along the left side of the page (under Select a template), you can scroll vertically through a list of content types that you can use to organize your visualizations. We'll get much deeper into the process of assembling, but for now, let's select Dashboard (by clicking on the word Dashboard) and then Single Page layout (by double-clicking on the highlighted rectangle labeled Freeform). Watson will save your new dashboard and the template with a blank canvas opened (as shown here): Notice the Data set icon (circled in the following screenshot) at the bottom of the canvas. Under the dataset icon, the Data set list icon, the name of the dataset, and data columns are displayed. The list of data columns are in what is referred to as the Data tray. If you click on the Data set icon, the information below it is hidden; click on it again and the information reappears. Using the above, you can add columns to the canvas by Dragging them from the Data tray. Selecting a column (or multiple columns) from the Data set list. Selecting a column from a different data set. This is done by clicking on the dataset list icon and then the < icon to view and select a different dataset. Besides adding columns of data, you can add visualizations by clicking on the Visualization icon (shown in the following image) and selecting a visualization type that you want to use. Moving to the right (from the Visualizations icon), we have additional icons providing various other options. These are text, media, web page, image and shapes, each allowing you to add and enhance your dashboard view. The far-right icon (shown in the following screenshot) is the Properties icon. This icon allows you to change your dashboard's Theme and General Style. As of now, only a few themes and styles are available, but more are planned. Another option for enhancing your dashboard, should the above not be sufficient, is to access your Watson collection (by clicking on the collection icon on the far right of the main toolbar shown below) and drag selections from the collection list to the dashboard canvas. Finally, if nothing else suits your needs, you can have Watson create a new visualization based on a question you type in the What do you want to assemble? field (shown in the following screenshot): A simple use case To gain a better understanding of how to use the Watson Predict and Assemble features, let's now take a look at a simple use case. One of the best ways to learn a new tool is by using it, and to use Watson, you need data. Up to this point, we've utilized sample data for use cases that I created from various sources, but Watson has made many sample datasets available for use for your learning. To view the sample data options, simply click on Add from the main or Welcome page and then click on Sample Data: For more information about the available Watson-supplied sample data, you can go to https://community.watsonanalytics.com/resources. Summary We learned how to create prediction and to see the output from the analysis. Resources for Article: Further resources on this subject: Messaging with WebSphere Application Server 7.0 (Part 1) [article] Programming on Raspbian [article] se of macros in IBM Cognos 8 Report Studio [article]

0
0
1082

Packt

10 Mar 2016

10 min read

VM, It Is Not What You Think!

Packt

10 Mar 2016

10 min read

0
0
1987

Packt

09 Mar 2016

13 min read

Keystone – OpenStack Identity Service

Packt

09 Mar 2016

13 min read

0
0
1041

article-image-openstack-performance-availability

Packt

17 Feb 2016

21 min read

OpenStack Performance, Availability

Packt

17 Feb 2016

21 min read

0
0
1978

Packt

09 Feb 2016

21 min read

Elastic Load Balancing

Packt

09 Feb 2016

21 min read

In this article by Yohan Wadia, the author of the book AWS Administration – The Definitive Guide, we are going continue where we last dropped off and introduce an amazing and awesome concept called as Auto Scaling! AWS has been one of the first Public Cloud providers to provide this feature and really it is something that you must try out and use in your environments! This chapter will teach you the basics of Auto Scaling, its concepts and terminologies, and even how to create an auto scaled environment using AWS. It will also cover Amazon Elastic Load Balancers and how you can use them in conjuncture with Auto Scaling to manage your applications more effectively! So without wasting any more time, let's first get started by understanding what Auto Scaling is and how it actually works! (For more resources related to this topic, see here.) An overview of Auto Scaling We have been talking about AWS and the concept of dynamic scalability a.k.a. Elasticity in general throughout this book; well now is the best time to look at it in depth with the help of Auto Scaling! Auto Scaling basically enables you to scale your compute capacity (EC2 instances) either up or down, depending on the conditions you specify. These conditions could be as simple as a number that maintains the count of your EC2 instances at any given time, or even complex conditions that measures the load and performance of your instances such as CPU utilization, memory utilization, and so on. But a simple question that may arise here is why do I even need Auto Scaling? Is it really that important? Let's look at a dummy application's load and performance graph to get a better understanding of things, let's take a look at the following screenshot: The graph to the left depicts the traditional approach that is usually taken to map an application's performance requirements with a fixed infrastructure capacity. Now to meet this application's unpredictable performance requirement, you would have to plan and procure additional hardware upfront, as depicted by the red line. And since there is no guaranteed way to plan for unpredictable workloads, you generally end up procuring more than you need. This is a standard approach employed by many businesses and it doesn't come without its own sets of problems. For example, the region highlighted in red is when most of the procured hardware capacity is idle and wasted as the application simply does not have that high a requirement. Whereas there can be cases as well where the procured hardware simply did not match the application's high performance requirements, as shown by the green region. All these issues, in turn, have an impact on your business, which frankly can prove to be quite expensive. That's where the elasticity of a Cloud comes into play. Rather than procuring at the nth hour and ending up with wasted resources, you grow and shrink your resources dynamically as per your application's requirements, as depicted in the graph on the right. This not only helps you in saving overall costs but also makes your application's management a lot more easy and efficient. And don't worry if your application does not have an unpredictable load pattern! Auto Scaling is designed to work with both predictable and unpredictable workloads so that no matter what application you may have, you can also be rest assured that the required compute capacity is always going to be made available for use when required. Keeping that in mind, let us summarize some of the benefits that AWS Auto Scaling provides: Cost Savings: By far the biggest advantage provided by Auto Scaling, you can actually gain a lot of control over the deployment of your instances as well as costs by launching instances only when they are needed and terminating them when they aren't required. Ease of Use: AWS provides a variety of tools using which you can create and manage your Auto Scaling such as the AWS CLI and even using the EC2 Management Dashboard. Auto Scaling can be programmatically created and managed via a simple and easy to use web service API as well. Scheduled Scaling Actions: Apart from scaling instances as per a given policy, you can additionally even schedule scaling actions that can be executed in the future. This type of scaling comes in handy when your application's workload patterns are predictable and well known in advance. Geographic Redundancy and Scalability: AWS Auto Scaling enables you to scale, distribute, as well as load balance your application automatically across multiple Availability Zones within a given region. Easier Maintenance and Fault Tolerance: AWS Auto Scaling replaces unhealthy instances automatically based on predefined alarms and threshold. With these basics in mind, let us understand how Auto Scaling actually works out in AWS. Auto scaling components To get started with Auto Scaling on AWS, you will be required to work with three primary components, each described briefly as follows. Auto scaling group An Auto Scaling Group is a core component of the Auto Scaling service. It is basically a logical grouping of instances that share some common scaling characteristics between them. For example, a web application can contain a set of web server instances that can form one Auto Scaling Group and another set of application server instances that become a part of another Auto Scaling Group and so on. Each group has its own set of criteria specified that includes the minimum and maximum number of instances that the Group should have along with the desired number of instances that the group must have at all times. Note: The desired number of instances is an optional field in an Auto Scaling Group. If the desired capacity value is not specified, then the Auto Scaling Group will consider the minimum number of instance value as the desired value instead. Auto Scaling Groups are also responsible for performing periodic health checks on the instances contained within them. An instance with a degraded health is then immediately swapped out and replaced by a new one by the Auto Scaling Group, thus ensuring that each of the instances within the Group work at optimum levels. Launch configurations A Launch Configuration is a set of blueprint statements that the Auto Scaling Group uses to launch instances. You can create a single Launch Configuration and use it with multiple Auto Scaling Groups; however, you can only associate one Launch Configuration with a single Auto Scaling Group at a time. What does a Launch Configuration contain? Well to start off with, it contains the AMI ID using which Auto Scaling launches the instances in the Auto Scaling Group. It also contains additional information about your instances such as instance type, the security group it has to be associated with, block device mappings, key pairs, and so on. An important thing to note here is that once you create a Launch Configuration, there is no way you can edit it again. The only way to make changes to a Launch Configuration is by creating a new one in its place and associating that with the Auto Scaling Group. Scaling plans With your Launch Configuration created, the final step left is to create one or more Scaling Plans. Scaling Plans describe how the Auto Scaling Group should actually scale. There are three scaling mechanisms you can use with your Auto Scaling Groups, each described as follows: Manual Scaling: Manual Scaling by far is the simplest way of scaling your resources. All you need to do here is specify a new desired number of instances value or change the minimum or maximum number of instances in an Auto Scaling Group and the rest is taken care of by the Auto Scaling service itself. Scheduled Scaling: Scheduled Scaling is really helpful when it comes to scaling resources based on a particular time and date. This method of scaling is useful when the application's load patterns are highly predictable, and thus you know exactly when to scale up or down. For example, an application that process a company's payroll cycle is usually load intensive during the end of each month, so you can schedule the scaling requirements accordingly. Dynamic Scaling: Dynamic Scaling or scaling on demand is used when the predictability of your application's performance is unknown. With Dynamic Scaling, you generally provide a set of scaling policies using some criteria, for example, scale the instances in my Auto Scaling Group by 10 when the average CPU Utilization exceeds 75 percent for a period of 5 minutes. Sounds familiar right? Well that's because these dynamic scaling policies rely on Amazon CloudWatch to trigger scaling events. CloudWatch monitors the policy conditions and triggers the auto scaling events when certain thresholds are beached. In either case, you will require a minimum of two such scaling polices: one for scaling in (terminating instances) and one for scaling out (launching instances). Before we go ahead and create our first Auto Scaling activity, we need to understand one additional AWS service that will help us balance and distribute the incoming traffic across our auto scaled EC2 instances. Enter the Elastic Load Balancer! Introducing the Elastic Load Balancer Elastic Load Balancer or ELB is a web service that allows you to automatically distribute incoming traffic across a fleet of EC2 instances. In simpler terms, an ELB acts as a single point of contact between your clients and the EC2 instances that are servicing them. The clients query your application via the ELB; thus, you can easily add and remove the underlying EC2 instances without having to worry about any of the traffic routing or load distributions. It is all taken care of by the ELB itself! Coupled with Auto Scaling, ELB provides you with a highly resilient and fault tolerant environment to host your applications. While the Auto Scaling service automatically removes any unhealthy EC2 instances from its Group, the ELB automatically reroutes the traffic to some other healthy instance. Once a new healthy instance is launched by the Auto Scaling service, ELB will once again re-route the traffic through it and balance out the application load as well. But the work of the ELB doesn't stop there! An ELB can also be used to safeguard and secure your instances by enforcing encryption and by utilizing only HTTPS and SSL connections. Keeping these points in mind, let us look at how an ELB actually works. Well to begin with, when you create an ELB in a particular AZ, you are actually spinning up one or more ELB nodes. Don't worry, you cannot physically see these nodes nor perform any much actions on them. They are completely managed and looked after by AWS itself. This node is responsible for forwarding the incoming traffic to the healthy instances present in that particular AZ. Now here's the fun part! If you configure the ELB to work across multiple AZs and assume that one entire AZ goes down or the instances in that particular AZ become unhealthy for some reason, then the ELB will automatically route traffic to the healthy instances present in the second AZ. How does it do the routing? The ELB by default is provided with a Public DNS name, something similar to MyELB-123456789.region.elb.amazonaws.com. The clients send all their requests to this particular Public DNS name. The AWS DNS Servers then resolve this public DNS name to the public IP addresses of the ELB nodes. Each of the nodes has one or more Listeners configured on them which constantly checks for any incoming connections. Listeners are nothing but a process that are configured with a combination of protocol, for example, HTTP and a port, for example, 80. The ELB node that receives the particular request from the client then routes the traffic to a healthy instance using a particular routing algorithm. If the Listener was configured with a HTTP or HTTPS protocol, then the preferred choice of routing algorithm is the least outstanding requests routing algorithm. Note: If you had configured your ELB with a TCP listener, then the preferred routing algorithm is Round Robin. Confused? Well don't be as most of these things are handled internally by the ELB itself. You don't have to configure the ELB nodes nor the routing tables. All you need to do is set up the Listeners in your ELB and point all client requests to the ELB's Public DNS name, that's it! Keeping these basics in mind, let us go ahead and create our very first ELB! Creating your first Elastic Load Balancer Creating and setting up an ELB is a fairly easy and straightforward process provided you have planned and defined your Elastic Load Balancer's role from the start. The current version of ELB supports HTTP, HTTPS, TCP, as well as SSL connection protocols; however, for the sake of simplicity, we will be creating a simple ELB for balancing HTTP traffic only. I'll be using the same VPC environment that we have been developing since Chapter 5, Building your Own Private Clouds using Amazon VPC; however, you can easily substitute your own infrastructure in this place as well. To access the ELBDashboard, you will have to first access the EC2ManagementConsole. Next, from the navigation pane, select the LoadBalancers option, as shown in the following screenshot. This will bring up the ELBDashboard as well using which you can create and associate your ELBs. An important point to note here is that although ELBs are created using this particular portal, you can, however, use them for both your EC2 and VPC environments. There is no separate portal for creating ELBs in a VPC environment. Since this is our first ELB, let us go ahead and select the CreateLoadBalancer option. This will bring up a seven-step wizard using which you can create and customize your ELBs. Step 1 – Defining Load Balancer To begin with, provide a suitable name for your ELB in the LoadBalancername field. In this case, I have opted to stick to my naming convention and named the ELB as US-WEST-PROD-LB-01. Next up, select the VPC option in which you wish to deploy your ELB. Again, I have gone ahead and selected the US-WEST-PROD-1 (192.168.0.0/16) VPC that we created in Chapter 5, Building your Own Private Clouds using Amazon VPC. You can alternatively select your own VPC environment or even select a standalone EC2 environment if it is available. Do not check the Create an internal load balancer option as in this scenario we are creating an Internet-facing ELB for our Web Server instances. There are two types of ELBs that you can create and use based on your requirements. The first is an Internet-facing Load Balancer, which is used to balance out client requests that are inbound from the Internet. Ideally, such Internet-facing load balancers connect to the Public Subnets of a VPC. Similarly, you also have something called as Internal Load Balancers that connect and route traffic to your Private Subnets. You can use a combination of these depending on your application's requirements and architecture, for example, you can have one Internet-facing ELB as your application's main entry point and an internal ELB to route traffic between your Public and Private Subnets; however, for simplicity, let us create an Internet-facing ELB for now. With these basic settings done, we now provide our ELB's Listeners. A Listener is made up of two parts: a protocol and port number for your frontend connection (between your Client and the ELB), and a protocol and a port number for a backend connection (between the ELB and the EC2 instances). In the ListenerConfiguration section, select HTTP from the Load Balancer Protocol dropdown list and provide the port number 80 in the Load Balancer Port field, as shown in the following screenshot. Provide the same protocol and port number for the Instance Protocol and Instance Port field as well. What does this mean? Well this listener is now configured to listen on the ELB's external port (Load Balancer Port) 80 for any client's requests. Once it receives the requests, it will then forward it out to the underlying EC2 instances using the Instance Port, which in this case is port 80 as well. There is no thumb rule as such that both the port values must match; in fact, it is actually a good practice to keep them different. Although your ELB can listen on port 80 for any client's requests, it can use any ports within the range of 1-65,535 for forwarding the request to the instances. You can optionally add additional listeners to your ELB such as a listener for the HTTPS protocol running on port 443 as well; however, that is something that I will leave you do to later. The final configuration item left in step 1 is where you get to select the Subnets option to be associated with your new Load Balancer. In my case, I have gone ahead and created a set of subnets each in two different AZs so as to mimic a high availability scenario. Select any particular Subnets and add them to your ELB by selecting the adjoining + sign. In my case, I have selected two Subnets, both belonging to the web server instances; however, both present in two different AZs. Note: You can select a single Subnet as well; however, it is highly recommended that you go for a high available architecture, as described earlier. Once your subnets are added, click on Next: Assign Security Groups to continue over to step 2. Step 2 – Assign Security Groups Step 2 is where we get to assign our ELB with a Security Group. Now here a catch: You will not be prompted for a Security Group if you are using an EC2-Classic environment for your ELB. This Security Group is only necessary for VPC environments and will basically allow the port you designated for inbound traffic to pass through. In this case, I have created a new dedicated Security Group for the ELB. Provide a suitable Security group name as well as Description, as shown in the preceding screenshot. The new security group already contains a rule that allows traffic to the port that you configured your Load Balancer to use, in my case its port 80. Leave the rule to its default value and click on Next: Configure Security Settingsto continue. Step 3 – Configure Security Settings This is an optional page that basically allows you to secure your ELB by using either the HTTPS or the SSL protocol for your frontend connection. But since we have opted for a simple HTTP-based ELB, we can ignore this page for now. Click on Next: Configure Health Check to proceed on to the next step. Step 4 – Configure Health Check Health Checks are a very important part of an ELB's configuration and hence you have to be extra cautious when setting it up. What are Health Checks? To put it in simple terms, these are basic tests that the ELB conducts to ensure that your underlying EC2 instances are healthy and running optimally. These tests include simple pings, attempted connections, or even some send requests. If the ELB senses either of the EC2 instances in an unhealthy state, it immediately changes its Health Check Status to OutOfService. Once the instance is marked as OutOfService, the ELB no longer routes any traffic to it. The ELB will only start sending traffic back to the instance only if its Health Check State changes to InService again. To configure the Health Checks for your ELB, fill in the following information as described here: Ping Protocol: This field indicates which protocol the ELB should use to connect to your EC2 instances. You can use the TCP, HTTP, HTTPS, or the SSL options; however, for simplicity, I have selected the HTTP protocol here. Ping Port: This field is used to indicate the port which the ELB should use to connect to the instance. You can supply any port value from the range 1 to 65,535; however, since we are using the HTTP protocol, I have opted to stick with the default value of port 80. This port value is really essential as the ELB will periodically ping the EC2 instances on this port number. If any instance does not reply back in a timely fashion, then that instance will be deemed unhealthy by the ELB. Ping Path: This value is usually used for the HTTP and HTTPS protocols. The ELB sends a simple GET request to the EC2 instances based on the Ping Port and Ping Path. If the ELB receives a response other than an "OK," then that particular instance is deemed to be unhealthy by the ELB and it will no longer route any traffic to it. Ping Paths generally are set with a forward slash "/", which indicates the default home page of a web server. However, you can also use a /index.html or a /default.html value as you seem fit. In my case, I have provided the /index.php value as my dummy web application is actually a PHP app. Besides the Ping checks, there are also a few advanced configuration details that you can configure based on your application's health check needs: Response Time: The Response Time is the time the ELB has to wait in order to receive a response. The default value is 5 seconds with a max value up to 60 seconds. Let's take a look at the following screenshot: Health Check Interval: This field indicates the amount of time (in seconds) the ELB waits between health checks of an individual EC2 instance. The default value is 30 seconds; however, you can specify a max value of 300 seconds as well. Unhealthy Threshold: This field indicates the number of consecutive failed health checks an ELB must wait before declaring an instance unhealthy. The default value is 2 with a max threshold value of 10. Healthy Threshold: This field indicates the number of consecutive successful health checks an ELB must wait before declaring an instance healthy. The default value is 2 with a max threshold value of 10. Once you have provided your values, go ahead and select the Next: Add EC2 Instances option. Step 5 – Add EC2 Instances In this section of the Wizard, you can select any running instance from your Subnets to be added and registered with the ELB. But since we are setting this particular ELB for use with Auto Scaling, we will leave this section for now. Click on Next: Add Tags to proceed with the wizard. Step 6 – Add Tags We already know the importance of tagging our AWS resources, so go ahead and provide a suitable tag for categorizing and identifying your ELB. Note that you can always add/edit and remove tags at a later time as well using the ELB Dashboard. With the Tags all set up, click on Review and Create. Step 7 – Review and Create The final steps of our ELB creation wizard is where we simply review our ELB's settings, including the Health Checks, EC2 instances, Tags, and so on. Once reviewed, click on Create to begin your ELB's creation and configuration. The ELB takes a few seconds to get created, but once it's ready, you can view and manage it just like any other AWS resource using the ELBDashboard, as shown in the following screenshot: Select the newly created ELB and view its details in the Description tab. Make a note of the ELB's public DNS Name as well. You can optionally even view the Status as well as the ELBScheme (whether Internet-facing or internal) using the Description tab. You can also view the ELB's Health Checks as well as the Listeners configured with your ELB. Before we proceed with the next section of this chapter, here are a few important pointers to keep in mind when working with ELB. Firstly, the configurations that we performed on our ELB are all very basic and will help you to get through the basics; however, ELB also provides us with additional advanced configuration options such as Cross-Zone Load Balancing, Proxy Protocols, and Sticky Sessions, and so on, which can all be configured using the ELB Dashboard. To know more about these advanced settings, refer to http://docs.aws.amazon.com/ElasticLoadBalancing/latest/DeveloperGuide/elb-configure-load-balancer.html. Second important thing worth mentioning is the ELB's costs. Although it is free (Terms and Conditions apply) to use under the Free Tier eligibility, ELBs are charged approximately $0.025 per hour used. There is a nominal charge on the data transferring charge as well, which is approximately $0.008 per GB of data processed. Summary I really hope that you have got to learn about Amazon ELB as much as possible. We talked about the importance of Auto Scaling and how it proves to be super beneficial when compared to the traditional mode of scaling infrastructure. We then learnt a bit about AWS Auto Scaling and its core components. Next, we learnt about a new service offering called as Elastic Load Balancers and saw how easy it is to deploy one for our own use. Resources for Article: Further resources on this subject: Achieving High-Availability on AWS Cloud [article] Amazon Web Services [article] Patterns for Data Processing [article]

0
0
3883

Packt

05 Feb 2016

29 min read

Working with Ceph Block Device

Packt

05 Feb 2016

29 min read

In this article by Karan Singh, the author of the book Ceph Cookbook, we will see how storage space or capacity are assigned to physical or virtual servers in detail. We'll also cover the various storage formats supported by Ceph. In this article, we will cover the following recipes: Working with the RADOS Block Device Configuring the Ceph client Creating RADOS Block Device Mapping RADOS Block Device Ceph RBD Resizing Working with RBD snapshots Working with RBD clones A quick look at OpenStack Ceph – the best match for OpenStack Configuring OpenStack as Ceph clients Configuring Glance for the Ceph backend Configuring Cinder for the Ceph backend Configuring Nova to attach the Ceph RBD Configuring Nova to boot the instance from the Ceph RBD (For more resources related to this topic, see here.) Once you have installed and configured your Ceph storage cluster, the next task is performing storage provisioning. Storage provisioning is the process of assigning storage space or capacity to physical or virtual servers, either in the form of blocks, files, or object storages. A typical computer system or server comes with a limited local storage capacity that might not be enough for your data storage needs. Storage solutions such as Ceph provide virtually unlimited storage capacity to these servers, making them capable of storing all your data and making sure that you do not run out of space. Using a dedicated storage system instead of local storage gives you the much needed flexibility in terms of scalability, reliability, and performance. Ceph can provision storage capacity in a unified way, which includes block, filesystem, and object storage. The following diagram shows storage formats supported by Ceph, and depending on your use case, you can select one or more storage options: We will discuss each of these options in detail in this article, and we will focus mainly on Ceph block storage. Working with the RADOS Block Device The RADOS Block Device (RBD), which is now known as the Ceph Block Device, provides reliable, distributed, and high performance block storage disks to clients. A RADOS block device makes use of the librbd library and stores a block of data in sequential form striped over multiple OSDs in a Ceph cluster. RBD is backed by the RADOS layer of Ceph, thus every block device is spread over multiple Ceph nodes, delivering high performance and excellent reliability. RBD has native support for Linux kernel, which means that RBD drivers are well integrated with the Linux kernel since the past few years. In addition to reliability and performance, RBD also provides enterprise features such as full and incremental snapshots, thin provisioning, copy on write cloning, dynamic resizing, and so on. RBD also supports In-Memory caching, which drastically improves its performance. The industry leading open source hypervisors, such as KVM and Zen, provide full support to RBD and leverage its features to their guest virtual machines. Other proprietary hypervisors, such as VMware and Microsoft HyperV will be supported very soon. There has been a lot of work going on in the community for support to these hypervisors. The Ceph block device provides full support to cloud platforms such as OpenStack, Cloud stack, as well as others. It has been proven successful and feature-rich for these cloud platforms. In OpenStack, you can use the Ceph block device with cinder (block) and glance (imaging) components. Doing so, you can spin 1000s of Virtual Machines (VMs) in very little time, taking advantage of the copy on write feature of the Ceph block storage. All these features make RBD an ideal candidate for cloud platforms such as OpenStack and CloudStack. We will now learn how to create a Ceph block device and make use of it. Configuring the Ceph client Any regular Linux host (RHEL- or Debian-based) can act as a Ceph client. The Client interacts with the Ceph storage cluster over the network to store or retrieve user data. Ceph RBD support has been added to the Linux mainline kernel, starting with 2.6.34 and later versions. How to do it As we have done earlier, we will set up a Ceph client machine using vagrant and VirtualBox. We will use the Vagrantfile. Vagrant will then launch an Ubuntu 14.04 virtual machine that we will configure as a Ceph client: From the directory where we have cloned ceph-cookbook git repository, launch the client virtual machine using Vagrant: $ vagrant status client-node1$ vagrant up client-node1 Log in to client-node1: $ vagrant ssh client-node1 Note: The username and password that vagrant uses to configure virtual machines is vagrant, and vagrant has sudo rights. The default password for root user is vagrant. Check OS and kernel release (this is optional): $ lsb_release -a$ uname -r Check for RBD support in the kernel: $ sudo modprobe rbd Allow the ceph-node1 monitor machine to access client-node1 over ssh. To do this, copy root ssh keys from the ceph-node1 to client-node1 vagrant user. Execute the following commands from the ceph-node1 machine until otherwise specified: ## Login to ceph-node1 machine $ vagrant ssh ceph-node1 $ sudo su - # ssh-copy-id vagrant@client-node1 Provide a one-time vagrant user password, that is, vagrant, for client-node1. Once the ssh keys are copied from ceph-node1 to client-node1, you should able to log in to client-node1 without a password. Use the ceph-deploy utility from ceph-node1 to install Ceph binaries on client-node1: # cd /etc/ceph # ceph-deploy --username vagrant install client-node1 Copy the Ceph configuration file (ceph.conf) to client-node1: # ceph-deploy --username vagrant config push client-node1 The client machine will require Ceph keys to access the Ceph cluster. Ceph creates a default user, client.admin, which has full access to the Ceph cluster. It's not recommended to share client.admin keys with client nodes. The better approach is to create a new Ceph user with separate keys and allow access to specific Ceph pools: In our case, we will create a Ceph user, client.rbd, with access to the rbd pool. By default, Ceph block devices are created on the rbd pool: # ceph auth get-or-create client.rbd mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=rbd' Add the key to the client-node1 machine for the client.rbd user: # ceph auth get-or-create client.rbd | ssh vagrant@client-node1 sudo tee /etc/ceph/ceph.client.rbd.keyring By this step, client-node1 should be ready to act as a Ceph client. Check the cluster status from the client-node1 machine by providing the username and secret key: $ vagrant ssh client-node1 $ sudo su - # cat /etc/ceph/ceph.client.rbd.keyring >> /etc/ceph/keyring### Since we are not using the default user client.admin we need to supply username that will connect to Ceph cluster.# ceph -s --name client.rbd Creating RADOS Block Device Till now, we have configured Ceph client, and now we will demonstrate creating a Ceph block device from the client-node1 machine. How to do it Create a RADOS Block Device named rbd1 of size 10240 MB: # rbd create rbd1 --size 10240 --name client.rbd There are multiple options that you can use to list RBD images: ## The default pool to store block device images is 'rbd', you can also specify the pool name with the rbd command using the -p option: # rbd ls --name client.rbd # rbd ls -p rbd --name client.rbd # rbd list --name client.rbd Check the details of the rbd image: # rbd --image rbd1 info --name client.rbd Mapping RADOS Block Device Now that we have created block device on Ceph cluster, in order to use this block device, we need to map it to the client machine. To do this, execute the following commands from the client-node1 machine. How to do it Map the block device to the client-node1: # rbd map --image rbd1 --name client.rbd Check the mapped block device: # rbd showmapped --name client.rbd To make use of this block device, we should create a filesystem on this and mount it: # fdisk -l /dev/rbd1 # mkfs.xfs /dev/rbd1 # mkdir /mnt/ceph-disk1 # mount /dev/rbd1 /mnt/ceph-disk1 # df -h /mnt/ceph-disk1 Test the block device by writing data to it: # dd if=/dev/zero of=/mnt/ceph-disk1/file1 count=100 bs=1M To map the block device across reboot, you should add the init-rbdmap script to the system startup, add the Ceph user and keyring details to /etc/ceph/rbdmap, and finally, update the /etc/fstab file: # wget https://raw.githubusercontent.com/ksingh7/ ceph-cookbook/master/rbdmap -O /etc/init.d/rbdmap # chmod +x /etc/init.d/rbdmap # update-rc.d rbdmap defaults ## Make sure you use correct keyring value in /etc/ceph/rbdmap file, which is generally unique for an environment. # echo "rbd/rbd1 id=rbd, keyring=AQCLEg5VeAbGARAAE4ULXC7M5Fwd3BGFDiHRTw==" >> /etc/ceph/rbdmap # echo "/dev/rbd1 /mnt/ceph-disk1 xfs defaults, _netdev 0 0 " >> /etc/fstab # mkdir /mnt/ceph-disk1 # /etc/init.d/rbdmap start Ceph RBD Resizing Ceph supports thin provisioned block devices, which means that the physical storage space will not get occupied until you begin storing data on the block device. The Ceph RADOS block device is very flexible; you can increase or decrease the size of an RBD on the fly from the Ceph storage end. However, the underlying filesystem should support resizing. Advance filesystems such as XFS, Btrfs, EXT, ZFS, and others support filesystem resizing to a certain extent. Please follow filesystem specific documentation to know more on resizing. How to do it To increase or decrease Ceph RBD image size, use the --size <New_Size_in_MB> option with the rbd resize command, this will set the new size for the RBD image: The original size of the RBD image that we created earlier was 10 GB. We will now increase its size to 20 GB: # rbd resize --image rbd1 --size 20480 --name client.rbd # rbd info --image rbd1 --name client.rbd Grow the filesystem so that we can make use of increased storage space. It's worth knowing that the filesystem resize is a feature of the OS as well as the device filesystem. You should read filesystem documentation before resizing any partition. The XFS filesystem supports online resizing. Check system message to know the filesystem size change: # dmesg | grep -i capacity # xfs_growfs -d /mnt/ceph-disk1 Working with RBD Snapshots Ceph extends full support to snapshots, which are point-in-time, read-only copies of an RBD image. You can preserve the state of a Ceph RBD image by creating snapshots and restoring the snapshot to get the original data. How to do it Let's see how a snapshot works with Ceph. To test the snapshot functionality of Ceph, let's create a file on the block device that we created earlier: # echo "Hello Ceph This is snapshot test" > /mnt/ ceph-disk1/snapshot_test_file Create a snapshot for the Ceph block device: Syntax: rbd snap create <pool-name>/<image-name>@<snap-name># rbd snap create rbd/rbd1@snapshot1 --name client.rbd To list snapshots of an image, use the following: Syntax: rbd snap ls <pool-name>/<image-name> # rbd snap ls rbd/rbd1 --name client.rbd To test the snapshot restore functionality of Ceph RBD, let's delete files from filesystem: # rm -f /mnt/ceph-disk1/* We will now restore the Ceph RBD snapshot to get back the files that deleted in the last step. Please note that a rollback operation will overwrite current the version of the RBD image and its data with the snapshot version. You should perform this operation carefully: Syntax: rbd snap rollback <pool-name>/<image-name>@<snap-name># rbd snap rollback rbd/rbd1@snapshot1 --name client.rbd Once the snapshot rollback operation is completed, remount the Ceph RBD filesystem to refresh the filesystem state. You should be able to get your deleted files back: # umount /mnt/ceph-disk1 # mount /dev/rbd1 /mnt/ceph-disk1 # ls -l /mnt/ceph-disk1 When you no longer need snapshots, you can remove a specific snapshot using the following syntax. Deleting the snapshot will not delete your current data on the Ceph RBD image: Syntax: rbd snap rm <pool-name>/<image-name>@<snap-name> # rbd snap rm rbd/rbd1@snapshot1 --name client.rbd If you have multiple snapshots of an RBD image, and you wish to delete all the snapshots with a single command, then use the purge sub command: Syntax: rbd snap purge <pool-name>/<image-name># rbd snap purge rbd/rbd1 --name client.rbd Working with RBD Clones Ceph supports a very nice feature for creating Copy-On-Write (COW) clones from RBD snapshots. This is also known as Snapshot Layering in Ceph. Layering allows clients to create multiple instant clones of Ceph RBD. This feature is extremely useful for cloud and virtualization platforms such as OpenStack, CloudStack, and Qemu/KVM, and so on. These platforms usually protect Ceph RBD images containing an OS / VM image in the form of a snapshot. Later, this snapshot is cloned multiple times to spawn new virtual machines / instances. Snapshots are read-only, but COW clones are fully writable; this feature of Ceph provides a greater level of flexibility and is extremely useful among cloud platforms: Every cloned image (child image) stores references of its parent snapshot to read image data. Hence, the parent snapshot should be protected before it can be used for cloning. At the time of data writing on the COW cloned image, it stores new data references to itself. COW cloned images are as good as RBD. They are quite flexible like RBD, which means that they are writable, resizable, and support snapshots and further cloning. In Ceph RBD, images are of two types: format-1 and format-2. The RBD snapshot feature is available on both types that is, in format-1 as well as in format-2 RBD images. However, the layering feature (the COW cloning feature) is available only for the RBD image with format-2. The default RBD image format is format-1. How to do it To demonstrate RBD cloning, we will intentionally create a format-2 RBD image, then create and protect its snapshot, and finally, create COW clones out of it: Create a format-2 RBD image and check its detail: # rbd create rbd2 --size 10240 --image-format 2 --name client.rbd # rbd info --image rbd2 --name client.rbd Create a snapshot of this RBD image: # rbd snap create rbd/rbd2@snapshot_for_cloning --name client.rbd To create a COW clone, protect the snapshot. This is an important step, we should protect the snapshot because if the snapshot gets deleted, all the attached COW clones will be destroyed: # rbd snap protect rbd/rbd2@snapshot_for_cloning --name client.rbd Next, we will create a cloned RBD image using this snapshot: Syntax: rbd clone <pool-name>/<parent-image>@<snap-name> <pool-name>/<child-image-name> # rbd clone rbd/rbd2@snapshot_for_cloning rbd/clone_rbd2 --name client.rbd Creating a clone is a quick process. Once it's completed, check new image information. You would notice that its parent pool, image, and snapshot information would be displayed: # rbd info rbd/clone_rbd2 --name client.rbd At this point, we have a cloned RBD image, which is dependent upon its parent image snapshot. To make the cloned RBD image independent of its parent, we need to flatten the image, which involves copying the data from the parent snapshot to the child image. The time it takes to complete the flattening process depends on the size of the data present in the parent snapshot. Once the flattening process is completed, there is no dependency between the cloned RBD image and its parent snapshot. To initiate the flattening process, use the following: # rbd flatten rbd/clone_rbd2 --name client.rbd # rbd info --image clone_rbd2 --name client.rbd After the completion of the flattening process, if you check image information, you will notice that the parent image/snapshot name is not present and the clone is independent. You can also remove the parent image snapshot if you no longer require it. Before removing the snapshot, you first have to unprotect it: # rbd snap unprotect rbd/rbd2@snapshot_for_cloning --name client.rbd Once the snapshot is unprotected, you can remove it: # rbd snap rm rbd/rbd2@snapshot_for_cloning --name client.rbd A quick look at OpenStack OpenStack is an open source software platform for building and managing public and private cloud infrastructure. It is being governed by an independent, non-profit foundation known as the OpenStack foundation. It has the largest and the most active community, which is backed by technology giants such as, HP, Red Hat, Dell, Cisco, IBM, Rackspace, and many more. OpenStack's idea for cloud is that it should be simple to implement and massively scalable. OpenStack is considered as the cloud operating system where users are allowed to instantly deploy hundreds of virtual machines in an automated way. It also provides an efficient way of hassle free management of these machines. OpenStack is known for its dynamic scale up, scale out, and distributed architecture capabilities, making your cloud environment robust and future-ready. OpenStack provides an enterprise class Infrastructure-as-a-service (IaaS) platform for all your cloud needs. As shown in the preceding diagram, OpenStack is made up of several different software components that work together to provide cloud services. Out of all these components, in this article, we will focus on Cinder and Glance, which provide block storage and image services respectively. For more information on OpenStack components, please visit http://www.openstack.org/. Ceph – the best match for OpenStack Since the last few years, OpenStack has been getting amazingly popular, as it's based on software defined on a wide range, whether it's computing, networking, or even storage. And when you talk storage for OpenStack, Ceph will get all the attraction. An OpenStack user survey, conducted in May 2015, showed Ceph dominating the block storage driver market with a whopping 44% production usage. Ceph provides a robust, reliable storage backend that OpenStack was looking for. Its seamless integration with OpenStack components such as cinder, glance, nova, and keystone provides all in one cloud storage backend for OpenStack. Here are some key benefits that make Ceph the best match for OpenStack: Ceph provides enterprise grade, feature rich storage backend at a very low cost per gigabyte, which helps to keep the OpenStack cloud deployment price down. Ceph is a unified storage solution for Block, File, or Object storage for OpenStack, allowing applications to use storage as they need. Ceph provides advance block storage capabilities for OpenStack clouds, which includes the easy and quick spawning of instances, as well as the backup and cloning of VMs. It provides default persistent volumes for OpenStack instances that can work like traditional servers, where data will not flush on rebooting the VMs. Ceph supports OpenStack in being host-independent by supporting VM migrations, scaling up storage components without affecting VMs. It provides the snapshot feature to OpenStack volumes, which can also be used as a means of backup. Ceph's copy-on-write cloning feature provides OpenStack to spin up several instances at once, which helps the provisioning mechanism function faster. Ceph supports rich APIs for both Swift and S3 Object storage interfaces. Ceph and OpenStack communities have been working closely since the last few years to make the integration more seamless, and to make use of new features as they are landed. In the future, we can expect that OpenStack and Ceph will be more closely associated due to Red Hat's acquisition of Inktank, the company behind Ceph; Red Hat is one of the major contributor of OpenStack project. OpenStack is a modular system, which is a system that has a unique component for a specific set of tasks. There are several components that require a reliable storage backend, such as Ceph, and extend full integration to it, as shown in the following diagram. Each of these components uses Ceph in their own way to store block devices and objects. The majority of cloud deployment based on OpenStack and Ceph use the Cinder, glance, and Swift integrations with Ceph. Keystone integration is used when you need an S3-compatible object storage on the Ceph backend. Nova integration allows boot from Ceph volume capabilities for your OpenStack instances. Setting up OpenStack The OpenStack setup and configuration is beyond the scope of this article; however, for ease of demonstration, we will use a virtual machine preinstalled with the OpenStack RDO Juno release. If you like, you can also use your own OpenStack environment and can perform Ceph integration. How to do it In this section, we will demonstrate setting up a preconfigured OpenStack environment using vagrant, and accessing it via CLI and GUI: Launch openstack-node1 using Vagrantfile. Make sure that you are on the host machine and are under the ceph-cookbook repository before bringing up openstack-node1 using vagrant: # cd ceph-cookbook # vagrant up openstack-node1 Once openstack-node1 is up, check the vagrant status and log in to the node: $ vagrant status openstack-node1 $ vagrant ssh openstack-node1 We assume that you have some knowledge of OpenStack and are aware of its operations. We will source the keystone_admin file, which has been placed under /root, and to do this, we need to switch to root: $ sudo su - $ source keystone_admin We will now run some native OpenStack commands to make sure that OpenStack is set up correctly. Please note that some of these commands do not show any information, since this is a fresh OpenStack environment and does not have instances or volumes created: # nova list # cinder list # glance image-list You can also log in to the OpenStack horizon web interface (https://192.168.1.111/dashboard) with the username as admin and password as vagrant. After logging in the Overview page opens: Configuring OpenStack as Ceph clients OpenStack nodes should be configured as Ceph clients in order to access the Ceph cluster. To do this, install Ceph packages on OpenStack nodes and make sure it can access the Ceph cluster. How to do it In this section, we are going to configure OpenStack as a Ceph client, which will be later used to configure cinder, glance, and nova: We will use ceph-node1 to install Ceph binaries on os-node1 using ceph-deploy. To do this, we should set up an ssh password-less login to os-node1. The root password is again the same (vagrant): $ vagrant ssh ceph-node1 $ sudo su - # ping os-node1 -c 1 # ssh-copy-id root@os-node1 Next, we will install Ceph packages to os-node1 using ceph-deploy: # cd /etc/ceph # ceph-deploy install os-node1 Push the Ceph configuration file, ceph.conf, from ceph-node1 to os-node1. This configuration file helps clients reach the Ceph monitor and OSD machines. Please note that you can also manually copy the ceph.conf file to os-node1 if you like: # ceph-deploy config push os-node1 Make sure that the ceph.conf file that we have pushed to os-node1 should have the permission of 644. Create Ceph pools for cinder, glance, and nova. You may use any available pool, but it's recommended that you create separate pools for OpenStack components: # ceph osd pool create images 128 # ceph osd pool create volumes 128 # ceph osd pool create vms 128 Set up client authentication by creating a new user for cinder and glance: # ceph auth get-or-create client.cinder mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=volumes, allow rwx pool=vms, allow rx pool=images' # ceph auth get-or-create client.glance mon 'allow r' osd 'allow class-read object_prefix rbd_children, allow rwx pool=images' Add the keyrings to os-node1 and change their ownership: # ceph auth get-or-create client.glance | ssh os-node1 sudo tee /etc/ceph/ceph.client.glance.keyring # ssh os-node1 sudo chown glance:glance /etc/ceph/ceph.client.glance.keyring # ceph auth get-or-create client.cinder | ssh os-node1 sudo tee /etc/ceph/ceph.client.cinder.keyring # ssh os-node1 sudo chown cinder:cinder /etc/ceph/ceph.client.cinder.keyring The libvirt process requires accessing the Ceph cluster while attaching or detaching a block device from Cinder. We should create a temporary copy of the client.cinder key that will be needed for the cinder and nova configuration later in this article: # ceph auth get-key client.cinder | ssh os-node1 tee /etc/ceph/temp.client.cinder.key At this point, you can test the previous configuration by accessing the Ceph cluster from os-node1 using the client.glance and client.cinder Ceph users. Log in to os-node1 and run the following commands: $ vagrant ssh openstack-node1 $ sudo su - # cd /etc/ceph # ceph -s --name client.glance --keyring ceph.client.glance.keyring # ceph -s --name client.cinder --keyring ceph.client.cinder.keyring Finally, generate uuid, then create, define, and set the secret key to libvirt and remove temporary keys: Generate a uuid by using the following: # cd /etc/ceph # uuidgen Create a secret file and set this uuid number to it: cat > secret.xml <<EOF <secret ephemeral='no' private='no'> <uuid>bb90381e-a4c5-4db7-b410-3154c4af486e</uuid> <usage type='ceph'> <name>client.cinder secret</name> </usage> </secret> EOF Make sure that you use your own uuid generated for your environment./ Define the secret and keep the generated secret value safe. We would require this secret value in the next steps: # virsh secret-define --file secret.xml Set the secret value that was generated in the last step to virsh and delete temporary files. Deleting the temporary files is optional; it's done just to keep the system clean: # virsh secret-set-value --secret bb90381e-a4c5-4db7-b410-3154c4af486e --base64 $(cat temp.client.cinder.key) && rm temp.client.cinder.key secret.xml # virsh secret-list Configuring Glance for the Ceph backend We have completed the configuration required from the Ceph side. In this section, we will configure the OpenStack glance to use Ceph as a storage backend. How to do it This section talks about configuring the glance component of OpenStack to store virtual machine images on Ceph RBD: Log in to os-node1, which is our glance node, and edit /etc/glance/glance-api.conf for the following changes: Under the [DEFAULT] section, make sure that the following lines are present: default_store=rbd show_image_direct_url=True Execute the following command to verify entries: # cat /etc/glance/glance-api.conf | egrep -i "default_store|image_direct" Under the [glance_store] section, make sure that the following lines are present under RBD Store Options: stores = rbd rbd_store_ceph_conf=/etc/ceph/ceph.conf rbd_store_user=glance rbd_store_pool=images rbd_store_chunk_size=8 Execute the following command to verify the previous entries: # cat /etc/glance/glance-api.conf | egrep -v "#|default" | grep -i rbd Restart the OpenStack glance services: # service openstack-glance-api restart Source the keystone_admin file for OpenStack and list the glance images: # source /root/keystonerc_admin # glance image-list Download the cirros image from the Internet, which will later be stored in Ceph: # wget http://download.cirros-cloud.net/0.3.1/cirros-0.3.1-x86_64-disk.img Add a new glance image using the following command: # glance image-create --name cirros_image --is-public=true --disk-format=qcow2 --container-format=bare < cirros-0.3.1-x86_64-disk.img List the glance images using the following command; you will notice there are now two glance images: # glance image-list You can verify that the new image is stored in Ceph by querying the image ID in the Ceph images pool: # rados -p images ls --name client.glance --keyring /etc/ceph/ceph.client.glance.keyring | grep -i id Since we have configured glance to use Ceph for its default storage, all the glance images will now be stored in Ceph. You can also try creating images from the OpenStack horizon dashboard: Finally, we will try to launch an instance using the image that we have created earlier: # nova boot --flavor 1 --image b2d15e34-7712-4f1d-b48d-48b924e79b0c vm1 While you are adding new glance images or creating an instance from the glance image stored on Ceph, you can check the IO on the Ceph cluster by monitoring it using the # watch ceph -s command. Configuring Cinder for the Ceph backend The Cinder program of OpenStack provides block storage to virtual machines. In this section, we will configure OpenStack Cinder to use Ceph as a storage backend. OpenStack Cinder requires a driver to interact with the Ceph block device. On the OpenStack node, edit the /etc/cinder/cinder.conf configuration file by adding the code snippet given in the following section. How to do it In the last section, we learned to configure glance to use Ceph. In this section, we will learn to use the Ceph RBD with the Cinder service of OpenStack: Since in this demonstration we are not using multiple backend cinder configurations, comment the enabled_backends option from the /etc/cinder/cinder.conf file: Navigate to the Options defined in cinder.volume.drivers.rbd section of the /etc/cinder/cinder.conf file and add the following.(replace the secret uuid with your environments value): volume_driver = cinder.volume.drivers.rbd.RBDDriver rbd_pool = volumes rbd_user = cinder rbd_secret_uuid = bb90381e-a4c5-4db7-b410-3154c4af486e rbd_ceph_conf = /etc/ceph/ceph.conf rbd_flatten_volume_from_snapshot = false rbd_max_clone_depth = 5 rbd_store_chunk_size = 4 rados_connect_timeout = -1 glance_api_version = 2 Execute the following command to verify the previous entries: # cat /etc/cinder/cinder.conf | egrep "rbd|rados|version" | grep -v "#" Restart the OpenStack cinder services: # service openstack-cinder-volume restart Source the keystone_admin files for OpenStack: # source /root/keystonerc_admin # cinder list To test this configuration, create your first cinder volume of 2 GB, which should now be created on your Ceph cluster: # cinder create --display-name ceph-volume01 --display-description "Cinder volume on CEPH storage" 2 Check the volume by listing the cinder and Ceph volumes pool: # cinder list # rados -p volumes --name client.cinder --keyring ceph.client.cinder.keyring ls | grep -i id Similarly, try creating another volume using the OpenStack Horizon dashboard. Configuring Nova to attach the Ceph RBD In order to attach the Ceph RBD to OpenStack instances, we should configure the nova component of OpenStack by adding the rbd user and uuid information that it needs to connect to the Ceph cluster. To do this, we need to edit /etc/nova/nova.conf on the OpenStack node and perform the steps that are given in the following section. How to do it The cinder service that we configured in the last section creates volumes on Ceph, however, to attach these volumes to OpenStack instances, we need to configure NOVA: Navigate to the Options defined in nova.virt.libvirt.volume section and add the following lines of code (replace the secret uuid with your environments value): rbd_user=cinder rbd_secret_uuid= bb90381e-a4c5-4db7-b410-3154c4af486e Restart the OpenStack nova services: # service openstack-nova-compute restart To test this configuration, we will attach the cinder volume to an OpenStack instance. List the instance and volumes to get the ID: # nova list # cinder list Attach the volume to the instance: # nova volume-attach 1cadffc0-58b0-43fd-acc4-33764a02a0a6 1337c866-6ff7-4a56-bfe5-b0b80abcb281 # cinder list You can now use this volume as a regular block disk from your OpenStack instance: Configuring Nova to boot the instance from the Ceph RBD In order to boot all OpenStack instances into Ceph, that is, for the boot-from-volume feature, we should configure an ephemeral backend for nova. To do this, edit /etc/nova/nova.conf on the OpenStack node and perform the changes shown next. How to do it This section deals with configuring NOVA to store entire virtual machine on the Ceph RBD: Navigate to the [libvirt] section and add the following: inject_partition=-2 images_type=rbd images_rbd_pool=vms images_rbd_ceph_conf=/etc/ceph/ceph.conf Verify your changes: # cat /etc/nova/nova.conf|egrep "rbd|partition" | grep -v "#" Restart the OpenStack nova services: # service openstack-nova-compute restart To boot a virtual machine in Ceph, the glance image format must be RAW. We will use the same cirros image that we downloaded earlier in this article and convert this image from the QCOW to RAW format (this is important). You can also use any other image, as long as it's in the RAW format: # qemu-img convert -f qcow2 -O raw cirros-0.3.1-x86_64-disk.img cirros-0.3.1-x86_64-disk.raw Create a glance image using a RAW image: # glance image-create --name cirros_raw_image --is-public=true --disk-format=raw --container-format=bare < cirros-0.3.1-x86_64-disk.raw To test the boot from the Ceph volume feature, create a bootable volume: # nova image-list # cinder create --image-id ff8d9729-5505-4d2a-94ad-7154c6085c97 --display-name cirros-ceph-boot-volume 1 List cinder volumes to check if the bootable field is true: # cinder list Now, we have a bootable volume, which is stored on Ceph, so let's launch an instance with this volume: # nova boot --flavor 1 --block_device_mapping vda=fd56314b-e19b-4129-af77-e6adf229c536::0 --image 964bd077-7b43-46eb-8fe1-cd979a3370df vm2_on_ceph --block_device_mapping vda = <cinder bootable volume id >--image = <Glance image associated with the bootable volume> Finally, check the instance status: # nova list At this point, we have an instance running from a Ceph volume. Try to log in to the instance from the horizon dashboard: Summary In this article, we have covered the various storage formats supported by Ceph in detail and how they were assigned to other physical or virtual servers. Resources for Article: Further resources on this subject: Ceph Instant Deployment [article] GNU Octave: Data Analysis Examples [article] Interacting with GNU Octave: Operators [article]

0
0
8829

How-To Tutorials - Cloud Computing

Deploying First Server

Planning for Failure (and Success)

Introduction to Ansible

Provision IaaS with Terraform

The Software-defined Data Center

RDO Installation

Deploying a Docker Container to the Cloud, Part 2

AIO setup of OpenStack – preparing the infrastructure code environment

Creating Multitenant Applications in Azure

Watson Analytics – Predict

Trending Topics

VM, It Is Not What You Think!

Keystone – OpenStack Identity Service

OpenStack Performance, Availability

Elastic Load Balancing

Working with Ceph Block Device