Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Save more on your purchases! discount-offer-chevron-icon
Savings automatically calculated. No voucher code required.
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon

How-To Tutorials - Cloud Computing

121 Articles
article-image-openstack-networking-and-security-with-ansible-2
Vijin Boricha
03 Jul 2018
9 min read
Save for later

Automating OpenStack Networking and Security with Ansible 2 [Tutorial]

Vijin Boricha
03 Jul 2018
9 min read
OpenStack is a software that can help us build a system similar to popular cloud providers, such as AWS or GCP. OpenStack provides an API and a dashboard to manage the resources that it controls. Basic operations, such as creating and managing virtual machines, block storage, object storage, identity management, and so on, are supported out of the box. This is an excerpt from Ansible 2 Cloud Automation Cookbook written by Aditya Patawari, Vikas Aggarwal. No matter the Cloud Platform you are using this book will help you orchestrate your cloud infrastructure.  In the case of OpenStack, we control the underlying hardware and network, which comes with its own pros and cons. In this article we will leverage Ansible 2 to automate not so common networking tasks in OpenStack. We can use custom network solutions. We can use economical equipment or high-end devices, depending upon the actual need. This can help us get the features that we want and may end up saving money. Caution: Although OpenStack can be hosted on premises, several cloud providers provide OpenStack as a service. Sometimes these cloud providers may choose to turn off certain features or provide add-on features. Sometimes, even while configuring OpenStack on a self-hosted environment, we may choose to toggle certain features or configure a few things differently. Therefore, inconsistencies may occur. All the code examples in this article are tested on a self-hosted OpenStack released in August 2017, named Pike. The underlying operating system was CentOS 7.4. Managing security groups Security groups are the firewalls that can be used to allow or disallow the flow of traffic. They can be applied to virtual machines. Security groups and virtual machines have a many-to-many relationship. A single security group can be applied to multiple virtual machines and a single virtual machine can have multiple security groups. How to do it… Let's create a security group as follows: - name: create a security group for web servers os_security_group: name: web-sg state: present description: security group for web servers The name parameter has to be unique. The description parameter is optional, but we recommend using it to state the purpose of the security group. The preceding task will create a security group for us, but there are no rules attached to it. A firewall without any rules is of little use. So let's go ahead and add a rule to allow access to port 80 as follows: - name: allow port 80 for http os_security_group_rule: security_group: web-sg protocol: tcp port_range_min: 80 port_range_max: 80 remote_ip_prefix: 0.0.0.0/0 We also need SSH access to this server, so we should allow port 22 as well: - name: allow port 80 for SSH os_security_group_rule: security_group: web-sg protocol: tcp port_range_min: 22 port_range_max: 22 remote_ip_prefix: 0.0.0.0/0 How it works… For this module, we need to specify the name of the security group. The rule that we are creating will be associated with this group. We have to supply the protocol and the port range information. If we just want to whitelist only one port, then that would be the upper and lower bound of the range. Lastly, we have to specify the allowed addresses in the form of CIDR. The address 0.0.0.0/0 signifies that port 80 is open for everyone. This task will add an ingress type rule and allow traffic on port 80 to reach the instance. Similarly, we have to add a rule to allow traffic on port 22 as well. Managing network resources A network is a basic building block of the infrastructure. Most of the cloud providers will supply a sample or default network. While setting up a self-hosted OpenStack instance, a single network is typically created automatically. However, if the network is not created, or if we want to create another network for the purpose of isolation or compliance, we can do so using the os_network module. How to do it… Let's go ahead and create an isolated network and name it private, as follows: - name: creating a private network os_network: state: present name: private In the preceding example, we created a logical network with no subnets. A network with no subnets is of little use, so the next step would be to create a subnet: - name: creating a private subnet os_subnet: state: present network_name: private name: app cidr: 192.168.0.0/24 dns_nameservers: - 8.8.4.4 - 8.8.8.8 host_routes: - destination: 0.0.0.0/0 nexthop: 104.131.86.234 - destination: 192.168.0.0/24 nexthop: 192.168.0.1 How it works… The preceding task will create a subnet named app in the network called private. We have also supplied a CIDR for the subnet, 192.168.0.0/24. We are using Google DNS for nameservers as an example here, but this information should be obtained from the IT department of the organization. Similarly, we have set up the example host routes, but this information should be obtained from the IT department as well. After successful execution of this recipe, our network is ready to use. User management OpenStack provides an elaborate user management mechanism. If we are coming from a typical third-party cloud provider, such as AWS or GCP, then it can look overwhelming. The following list explains the building blocks of user management: Domain: This is a collection of projects and users that define an administrative entity. Typically, they can represent a company or a customer account. For a self-hosted setup, this could be done on the basis of departments or environments. A user with administrative privileges on the domain can further create projects, groups, and users. Group: A group is a collection of users owned by a domain. We can add and remove privileges from a group and our changes will be applied to all the users within the group. Project: A project creates a virtual isolation for resources and objects. This can be done to separate departments and environments as well. Role: This is a collection of privileges that can be applied to groups or users. User: A user can be a person or a virtual entity, such as a program, that accesses OpenStack services. For a complete documentation of the user management components, go through the OpenStack Identity document at https://docs.openstack.org/keystone/pike/admin/identity-concepts.html.  How to do it… Let's go ahead and start creating some of these basic building blocks of user management. We should note that, most likely, a default version of these building blocks will already be present in most of the setups: We'll start with a domain called demodomain, as follows: - name: creating a demo domain os_keystone_domain: name: demodomain description: Demo Domain state: present register: demo_domain After we get the domain, let's create a role, as follows: - name: creating a demo role os_keystone_role: state: present name: demorole Projects can be created, as follows: - name: creating a demo project os_project: state: present name: demoproject description: Demo Project domain_id: "{{ demo_domain.id }}" enabled: True Once we have a role and a project, we can create a group, as follows: - name: creating a demo group os_group: state: present name: demogroup description: "Demo Group" domain_id: "{{ demo_domain.id }}" Let's create our first user: - name: creating a demo user os_user: name: demouser password: secret-pass update_password: on_create email: demo@example.com domain: "{{ demo_domain.id }}" state: present Now we have a user and a group. Let's add the user to the group that we created before: - name: adding user to the group os_user_group: user: demouser group: demogroup We can also associate a user or a group with a role: - name: adding role to the group os_user_role: group: demo2 role: demorole domain: "{{ demo_domain.id }}" How it works… In step 1, the os_keystone_domain would take a name as a mandatory parameter. We also supplied a description for our convenience. We are going to use the details of this domain, so we saved it in a variable called demo_domain. In step 2, the os_keystone_role would just take a name and create a role. Note that a role is not tied up with a domain. In step 3, the os_project module would require a name. We have added the description for our convenience. The projects are tied to a domain, so we have used the demo_domain variable that we registered in a previous task. In step 4, the groups are tied to domains as well. So, along with the name, we would specify the description and domain ID like we did before. At this point, the group is empty, and there are no users associated with this group. In step 5, we supply name along with a password for the user. The update_password parameter is set to on_create, which means that the password won't be modified for an existing user. This is great for the sake of idempotency. We also specify the email ID. This would be required for recovering the password and several other use cases. Lastly, we add the domain ID to create the user in the right domain. In step 6, the os_user_group module will help us associate the demouser with the demogroup. In step 7, the os_user_role will take a parameter for user or group and associate it with a role. A lot of these divisions might not be required for every organization. We recommend going through the documentation and understanding the use case of each of them. Another point to note is that we might not even see the user management bits on a day-to-day basis. Depending on the setup and our responsibilities, we might only interact with modules that involve managing resources, such as virtual machines and storage. We learned how to successfully to solve complex OpenStack networking tasks with Ansible 2. To learn more about managing other public cloud platforms like AWS and Azure refer to our book  Ansible 2 Cloud Automation Cookbook. Getting Started with Ansible 2 System Architecture and Design of Ansible An In-depth Look at Ansible Plugins
Read more
  • 0
  • 0
  • 3659

article-image-create-configure-azure-virtual-machine
Gebin George
25 May 2018
13 min read
Save for later

How to create and configure an Azure Virtual Machine

Gebin George
25 May 2018
13 min read
Creating virtual machines on Azure gives you on-demand, high-scale, secure, virtualized infrastructure using Windows Server. Virtual machine helps you deploy and scale applications easily. In this article, we will learn how to run an Azure Virtual Machine. This tutorial is an excerpt from the book, Hands-On Networking with Azure, written by Mohamed Waly. This book will help you efficiently monitor, diagnose, and troubleshoot Azure Networking. Creating an Azure VM is a very straightforward process – all you have to do is follow the given steps: Navigate to the Azure portal and search for Virtual Machines, as shown in the following screenshot: Figure 3.1: Searching for Virtual Machines Once the VM blade is opened, you can click on +Add to create a new VM, as shown in the following screenshot: Figure 3.2: Virtual Machines blade Once you have clicked on +Add, a new blade will pop up where you have to search for and select the desired OS for the VM, as shown in the following screenshot: Figure 3.3: Searching for Windows Server 2016 OS for the VM Once the OS is selected, you need to select the deployment model, whether that be Resource Manager or Classic, as shown in the following screenshot: Figure 3.4: Selecting the deployment model Once the deployment model is selected, a new blade will pop up where you have to specify the following: Name: Specify the name of the VM. VM disk type: Specify whether the disk type will be SSD or HDD. Consider that SSD will offer consistent, low-latency performance, but will incur more charges. Note that this option is not available for the Classic model in this blade, but is available in the Configure optional features blade. User name: Specify the username that will be used to log on the VM. Password: Specify the password, which must be between 12 and 123 characters long and must contain three of the following: one lowercase character, one uppercase character, one number, and one special character that is not or -. Subscription: This specifies the subscription that will be charged for the VM usage. Resource group: This specifies the resource group within which the VM will exist. Location: Specify the location in which the VM will be created. It is recommended that you select the nearest location to you. Save money: Here, you specify whether you own Windows Server Licenses with active Software Assurance (SA). If you do, Azure Hybrid Benefit is recommended to save compute costs. For more information about Azure Hybrid Benefit, you can check this page. Figure 3.5: Configure the VM basic settings Once you have clicked on OK, a new blade will pop up where you have to specify the VM size so that the VMs series can select the one that will fulfil your needs, as shown in the following screenshot: Figure 3.6: Select the VM size Once the VM size has been specified, you need to specify the following settings: Availability set: This option provides High availability for the VM by setting the VMs within the same application and availability set. Here, the VMs will be in different fault and update domains, granting the VMs high availability (up to 99.95% of Azure SLA). Use managed disks: Enable this feature to have Azure automatically manage the availability of disks to provide data redundancy and fault tolerance without creating and managing storage accounts on your own. This setting is not available in the Classic model. Virtual network: Specify the virtual network to which you want to assign the VM. Subnet: Select the subnet within the virtual network that you specified earlier to assign the VM to. Public IP address: Either select an existing public IP address or create a new one. Network security group (firewall): Select the NSG you want to assign to the VM NIC. This is called endpoints in the Classic model. Extensions: You can add more features to the VM using extensions, such as configuration management, antivirus protection, and so on. Auto-shutdown: Specify whether you want to shut down your VM daily or not; if you do, you can set a schedule. Considering that this option will help you saving compute cost especially for dev and test scenarios. This is not available in the Classic model. Notification before shutdown: Check this if you enabled Auto-shutdown and want to subscribe for notifications before the VM shuts down. This is not available in the Classic model. Boot diagnostics: This captures serial console output and screenshots of the VM running on a host to help diagnose start up issues. Guest OS diagnostics: This obtains metrics for the VM every minute; you can use these metrics to create alerts and stay informed of your applications. Diagnostics storage account: This is where metrics are written, so you can analyze them with your own tools. Figure 3.7: Specify more settings for the VM Enabling Boot diagnostics and Guest diagnostics will incur more charges since the diagnostics will need a dedicated storage account to store their data. Finally, once you are done with its settings, Azure will validate those you have specified and summarize them, as shown in the following screenshot: Figure 3.8: VM Settings Summary Once clicked on, Create the VM will start the creation process, and within minutes the VM will be created. Once the VM is created, you can navigate to the Virtual Machines blade to open the VM that has been created, as shown in the following screenshot: Figure 3.9: The created VM overview To connect to the VM, click on Connect, where a pre-configured RDP file with the required VM information will be downloaded. Open the RDP file. You will be asked to enter the username and password you specified for the VM during its configuration, as shown in the following screenshot: Figure 3.10: Entering the VM credentials Voila! You should now be connected to the VM. Azure VMs networking There are many network configurations that can be done for the VM. You can add additional NICs, change the private IP address, set a private or public IP address to be either static or dynamic, and you can change the inbound and outbound security rules. Adding inbound and outbound rules Adding inbound and outbound security rules to the VM NIC is a very simple process; all you need to do is follow these steps: Navigate to the desired VM. Scroll down to Networking, under SETTINGS, as shown in the following screenshot: Figure 3.11: VM networking settings To add inbound and outbound security rules, you have to click on either Add inbound or Add outbound. Once clicked on, a new blade will pop up where you have to specify settings using the following fields: Service: The service specifies the destination protocol and port range for this rule. Here, you can choose a predefined service, such as RDP or SSH, or provide a custom port range. Port ranges: Here, you need to specify a single port, a port range, or a comma-separated list of single ports or port ranges. Priority: Here, you enter the desired priority value. Name: Specify a name for the rule here. Description: Write a description for the rule that relates to it here. Figure 3.12: Adding an inbound rule Once you have clicked OK, the rule will be applied. Note that the same process applies when adding an outbound rule. Adding an additional NIC to the VM Adding an additional NIC starts from the same blade as adding inbound and outbound rules. To add an additional NIC, you have to follow the given steps: Before adding an additional NIC to the VM, you need to make sure that the VM is in a Stopped (Deallocated) status. Navigate to Networking on the desired VM. Click on Attach network interface, and a new blade will pop up. Here, you have to either create a network interface or select an existing one. If you are selecting an existing interface, simply click on OK and you are done. If you are creating a new interface, click on Create network interface, as shown in the following screenshot: Figure 3.13: Attaching network interface A new blade will pop up where you have to specify the following: Name: The name of the new NIC. Virtual network: This field will be grayed out because you cannot attach a VM's NIC to different virtual networks. Subnet: Select the desired subnet within the virtual network. Private IP address assignment: Specify whether you want to allocate this IP dynamically or statically. Network security group: Specify an NSG to be assigned to this NIC. Private IP address (IPv6): If you want to assign an IPv6 to this NIC, check this setting. Subscription: This field will be grayed out because you cannot have a VM's NIC in a different subscription. Resource group: Specify the resource group to which the NIC will exist. Location: This field will be grayed out because you cannot have VM NICs in different locations. Figure 3.14: Specify the NIC settings Once you are done, click Create. Once the network interface is created, you will return to the previous blade. Here, you need to specify the NIC you just created and click on OK, as shown in the following screenshot: Figure 3.15: Attaching the NIC Configuring the NICs The Network Interface Cards (NICs) include some configuration that you might be interested in. They are as follows: To navigate to the desired NIC, you can search for the network interfaces blade, as shown in the following screenshot: Figure 3.16: Searching for network interfaces blade Then, the blade will pop up, from which you can select the desired NIC, as shown in the following screenshot: Figure 3.17: Select the desired NIC You can also navigate back to the VM via | Networking | and then click on the desired NIC, as shown in the following screenshot: Figure 3.18: The VM NIC To configure the NIC, you need to follow the given steps: Once the NIC blade is opened, navigate to IP configurations, as shown in the following screenshot: Figure 3.19: NIC blade overview To enable IP forwarding, click on Enabled and then click Save. Enabling this feature will cause the NIC to receive traffic that is not destined to its own IP address. Traffic will be sent with a different source IP. To add another IP to the NIC, click on Add, and a new blade will pop up, for which you have to specify the following: Name: The name of the IP. Type: This field will be grayed out because a primary IP already exists. Therefore, this one will be secondary. Allocation: Specify whether the allocation method is static or dynamic. IP address: Enter the static IP address that belongs to the same subnet that the NIC belongs to. If you have selected dynamic allocation, you cannot enter the IP address statically. Public IP address: Specify whether or not you need a public IP address for this IP configuration. If you do, you will be asked to configure the required settings. Figure 3.20: Configure the IP configuration settings Click on Configure required settings for the public IP address and a new blade will pop up from which you can select an existing public IP address or create a new one, as shown in the following screenshot: Figure 3.21: Create a new public IP address Click on OK and you will return to the blade, as shown in Figure 3.20, with the following warning: Figure 3.22: Warning for adding a new IP address In this case, you need to plan for the addition of a new IP address to ensure that the time the VM is restarted is not during working hours. Azure VNets considerations for Azure VMs Building VMs in Azure is a common task, but to do this task well, and to make it operate properly, you need to understand the considerations of Azure VNets for Azure VMs. These considerations are as follows: Azure VNets enable you to bring your own IPv4/IPv6 addresses and assign them to Azure VMs, statically or dynamically You do not have access to the role that acts as DHCP or provides IP addresses; you can only control the ranges you want to use in the form of address ranges and subnets Installing a DHCP role on one of the Azure VMs is currently unsupported; this is because Azure does not use traditional Layer-2 or Layer-3 topology, and instead uses Layer-3 traffic with tunneling to emulate a Layer-2 LAN Private IP addresses can be used for internal communication; external communication can be done via public IP addresses You can assign multiple private and public IP addresses to a single VM You can assign multiple NICs to a single VM By default, all the VMs within the same virtual network can communicate with each other, unless otherwise specified by an NSG on a subnet within this virtual network The network security group (NSG) can sometimes cause an overhead; without this overhead, however, all VMs within the same subnet would communicate with each other By default, an inbound security rule is created for remote desktops for Windows-based VMs, and SSH for Linux-based VMs The inbound security rules are first applied on the NSG of the subnet and then the VM NIC NSG – for example, if the subnet's NSG allows HTTP traffic, it will pass through it; however, it may not reach its destination if the VM NIC NSG does not allow it The outbound security rules are applied for the VM NIC NSG first, and then applied on the subnet NSG Multiple NICs assigned to a VM can exist in different subnets Azure VMs with multiple NICs in the same availability set do not have to have the same number of NICs, but the VMs must have at least two NICs When you attach an NIC to a VM, you need to ensure that they exist in the same location and subscription The NIC and the VNet must exist in the same subscription and location The NIC's MAC address cannot be changed until the VM to which the NIC is assigned is deleted Once the VM is created, you cannot change the VNet to which it is assigned; however, you can change the subnet to which the VM is assigned You cannot attach an existing NIC to a VM during its creation, but you can add an existing NIC as an additional NIC By default, a dynamic public IP address is assigned to the VM during creation, but this address will change if the VM is stopped or deleted; to ensure it will not change, you need to ensure its IP address is static In a multi-NIC VM, the NSG that is applied to one NIC does not affect the others If you found this post useful, do check out the book Hands-On Networking with Azure, to design and implement Azure Networking for Azure VMs. Read More Introducing Azure Sphere – A secure way of running your Internet of Things devices Learn Azure Serverless computing for free: Download e-book
Read more
  • 0
  • 0
  • 6555

article-image-secure-azure-virtual-network
Gebin George
18 May 2018
7 min read
Save for later

How to secure an Azure Virtual Network

Gebin George
18 May 2018
7 min read
The most common question that anyone asks when they buy a service is, can it be secured? The answer to that question, in this case, is absolutely yes. In this tutorial, we will learn to secure your connection between virtual machines. On top of the security, Microsoft provides for Azure as a vendor, there are some configurations that you can do at your end to increase the level of security to your virtual network. For a higher level of security, you can use the following: NSG: It is like a firewall that controls the inbound and outbound traffic by specifying which traffic is allowed to flow to/from the NIC/subnet Distributed denial of service (DDoS) protection: It is used to prevent DDoS attacks and at the time of writing is in preview This tutorial is an excerpt from the book, Hands-On Networking with Azure, written by Mohamed Waly.  Network Security Groups (NSG) NSG controls the flow of traffic by specifying which traffic is allowed to enter or exit the network. Creating NSG Creating NSG is a pretty straightforward process. To do it, you need to follow these steps: Navigate to Azure portal, and search for network security groups, as shown in the following screenshot: Figure 2.13: Searching for network security groups Once you have clicked on it, a new blade will be opened wherein all the created NSGs are located, as shown in the following screenshot: Figure 2.12: Network security groups blade Click on Add and a new blade will pop up, where you have to specify the following: Name: The name of the NSG Subscription: The subscription, which will be charged for NSG usage Resource group: The resource group within which the NSG will be located as a resource Location: The region where this resource will be created Figure 2.13: Creating an NSG Once you have clicked on Create, the NSG will be created within seconds. Inbound security rules By default, all the subnets and NICs that are not associated with NSG have all the inbound traffic allowed and once they are associated with an NSG, the following inbound security rules are assigned to them as they are a default part of any NSG: AllowVnetInBound: Allows all the inbound traffic that comes from a virtual network AllowAzureLoadBalancerInBound: Allows all the inbound traffic that comes from Load Balancer DenyAllInbound: Denies all the inbound traffic that comes from any source Figure 2.14: Default inbound security rules As shown in the previous screenshot, the rule consists of some properties, such as PRIORITY, NAME, PORT, and so on. It is important to understand what these properties mean for a better understanding of security rules. So, let's go ahead and explain them: PRIORITY: A number assigned to each rule to specify which rule has a higher priority than the other. The lower the number, the higher the priority. You can specify a priority with any number between 100 and 4096. NAME: The name of the rule. The same name cannot be reused within the same network security group. PORT: The allowed port through which the traffic will flow to the network. PROTOCOL: Specify whether the protocol you are using is TCP or UDP. SOURCE and DESTINATION: The source can be any, an IP address range, or a service tag. You can remove the default rules by clicking on Default rules. You can customize your own inbound rules, by following these steps: On the Inbound security rules blade, click on Add. A new blade will pop up, where you have to specify the following: Source: The source can be Any, an IP address range, or a service tag. It specifies the incoming traffic from a specific source IP address range that will be allowed or denied by this rule. Source port ranges: You can provide a single port, such as 80, a port range, such as 1024 - 65535, or a comma-separated list of single ports and/or port ranges, such as 80, 1024 - 65535. This specifies on which ports traffic will be allowed or denied by this rule. Provide an asterisk (*) to allow traffic on any port. Destination: The destination can be Any, an IP address range, or a virtual network. It specifies the outing traffic to a specific destination IP address range that will be allowed or denied by this rule. Destination port ranges: What applies for the source port ranges, applies for the destination port ranges. Protocol: It can be Any, TCP, or UDP. Action: Whether to Allow the rule or to Deny it. Priority: As mentioned earlier, the lower the number, the higher the priority. The priority number must be between 100 - 4096. Name: The name of the rule. Description: The description of the rule, which will help you to differentiate between the rules. In our scenario, I want to allow all the incoming connections to access a website published on a web server located in a virtual network, as shown in the following screenshot: Figure 2.15: Creating an inbound security rule Once you click on OK, the rule will be created. Outbound security rules Outbound security rules are no different than inbound security rules, except inbound rules are meant for inbound traffic and outbound rules are meant for outbound traffic. Otherwise, everything else is similar. Associating the NSG Once you have the NSG created, you can associate it to either an NIC or a subnet. Associating the NSG to an NIC To associate the NSG to an NIC, you need to follow these steps: Navigate to the Network security groups that you have created and then select Network interfaces, as shown in the following screenshot: Figure 2.16: Associated NICs to an NSG Click on Associate. A new blade will pop up, from which you need to select the NIC that you want to associate with the NSG, as shown in the following screenshot: Figure 2.17: NICs to be associated to the NSG Voila! You are done. Associating the NSG to a subnet To associate the NSG to a subnet, you need to follow these steps: Navigate to the Network security groups that you have created and then select Subnets, as shown in the following screenshot: Figure 2.18: Associated subnets to an NSG Click on Associate. A new blade will pop up, where you have to specify the virtual network within which the subnet exists, as shown in the following screenshot: Figure 2.19: Choosing the VNet within which the subnet exists Then, you need to specify which subnet of the VNet you want to associate the NSG to, as shown in the following screenshot: Figure 2.20: Selecting the subnet to which the NSG will be associated Once the subnet is selected, click on OK, and it will take some seconds to get it associated to the NSG. Azure DDoS protection DDoS attacks have spread out lately, by exhausting the application and making it unavailable for use, and you can expect an attack of that type any time. Recently, Microsoft announced the support of Azure DDoS protection as a service for protecting Azure resources, such as Azure VMs, Load Balancers, and Application Gateways. Azure DDoS protection comes in two flavors: Basic: This type has been around for a while as it is already enabled for Azure services to mitigate DDoS attacks. It incurs no charges. Standard: This flavor comes with more enhancements that mitigate attacks, especially for Azure VNet. At the time of writing this book, Azure DDoS protection standard is in preview and it is not available at the portal. You need to request it by filling out a form that is available here. If you found this post useful, do check out the book Hands on Networking with Azure, to design and implement Azure Networking for Azure VMs. Read More The Microsoft Azure Stack Architecture What is Azure API Management?
Read more
  • 0
  • 0
  • 7392
Banner background image

article-image-cloudtrail-logs-amazon-elasticsearch
Vijin Boricha
10 May 2018
5 min read
Save for later

Analyzing CloudTrail Logs using Amazon Elasticsearch

Vijin Boricha
10 May 2018
5 min read
Log management and analysis for many organizations start and end with just three letters: E, L, and K, which stands for Elasticsearch, Logstash, and Kibana. In today's tutorial, we will learn about analyzing CloudTrail logs which are E, L and K. [box type="shadow" align="" class="" width=""]This tutorial is an excerpt from the book AWS Administration - The Definitive Guide - Second Edition written by Yohan Wadia. This book will help you enhance your application delivery skills with the latest AWS services while also securing and monitoring the environment workflow.[/box] The three open-sourced products are essentially used together to aggregate, parse, search and visualize logs at an enterprise scale: Logstash: Logstash is primarily used as a log collection tool. It is designed to collect, parse, and store logs originating from multiple sources, such as applications, infrastructure, operating systems, tools, services, and so on. Elasticsearch: With all the logs collected in one place, you now need a query engine to filter and search through these logs for particular events. That's exactly where Elasticsearch comes into play. Elasticsearch is basically a search server based on the popular information retrieval software library, Lucene. It provides a distributed, full-text search engine along with a RESTful web interface for querying your logs. Kibana: Kibana is an open source data visualization plugin, used in conjunction with Elasticsearch. It provides you with the ability to create and export your logs into various visual graphs, such as bar charts, scatter graphs, pie charts, and so on. You can easily download and install each of these components in your AWS environment, and get up and running with your very own ELK stack in a matter of hours! Alternatively, you can also leverage AWS own Elasticsearch service! Amazon Elasticsearch is a managed ELK service that enables you to quickly deploy operate, and scale an ELK stack as per your requirements. Using Amazon Elasticsearch, you eliminate the need for installing and managing the ELK stack's components on your own, which in the long run can be a painful experience. For this particular use case, we will leverage a simple CloudFormation template that will essentially set up an Amazon Elasticsearch domain to filter and visualize the captured CloudTrail Log files, as depicted in the following diagram: To get started, log in to the CloudFormation dashboard, at https://console.aws.amazon.com/cloudformation. Next, select the option Create Stack to bring up the CloudFormation template selector page. Paste http://s3.amazonaws.com/concurrencylabs-cfn-templates/cloudtrail-es-cluster/cloudtrail-es-cluster.json in, the Specify an Amazon S3 template URL field, and click on Next to continue. In the Specify Details page, provide a suitable Stack name and fill out the following required parameters: AllowedIPForEsCluster: Provide the IP address that will have access to the nginx proxy and, in turn, have access to your Elasticsearch cluster. In my case, I've provided my laptop's IP. Note that you can change this IP at a later stage, by visiting the security group of the nginx proxy once it has been created by the CloudFormation template. CloudTrailName: Name of the CloudTrail that we set up at the beginning of this chapter. KeyName: You can select a key-pair for obtaining SSH to your nginx proxy instance: LogGroupName: The name of the CloudWatch Log Group that will act as the input to our Elasticsearch cluster. ProxyInstanceTypeParameter: The EC2 instance type for your proxy instance. Since this is a demonstration, I've opted for the t2.micro instance type. Alternatively, you can select a different instance type as well. Once done, click on Next to continue. Review the settings of your stack and hit Create to complete the process. The stack takes a good few minutes to deploy as a new Elasticsearch domain is created. You can monitor the progress of the deployment by either viewing the CloudFormation's Output tab or, alternatively, by viewing the Elasticsearch dashboard. Note that, for this deployment, a default t2.micro.elasticsearch instance type is selected for deploying Elasticsearch. You should change this value to a larger instance type before deploying the stack for production use. You can view information on Elasticsearch Supported Instance Types at http://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/aes-supported-instance-types.html. With the stack deployed successfully, copy the Kibana URL from the CloudFormation Output tab: "KibanaProxyEndpoint": "http://<NGINX_PROXY>/_plugin/kibana/" The Kibana UI may take a few minutes to load. Once it is up and running, you will need to configure a few essential parameters before you can actually proceed. Select Settings and hit the Indices option. Here, fill in the following details: Index contains time-based events: Enable this checkbox to index time-based events Use event times to create index names: Enable this checkbox as well Index pattern interval: Set the Index pattern interval to Daily from the drop-down list Index name of pattern: Type [cwl-]YYYY.MM.DD in to this field Time-field name: Select the @timestamp value from the drop-down list Once completed, hit Create to complete the process. With this, you should now start seeing logs populate on to Kibana's dashboard. Feel free to have a look around and try out the various options and filters provided by Kibana:   Phew! That was definitely a lot to cover! But wait, there's more! AWS provides yet another extremely useful governance and configuration management service AWS Config, know more from this book AWS Administration - The Definitive Guide - Second Edition. The Cloud and the DevOps Revolution Serverless computing wars: AWS Lambdas vs Azure Functions
Read more
  • 0
  • 0
  • 8365

article-image-create-aws-cloudtrail
Vijin Boricha
09 May 2018
8 min read
Save for later

How to create your own AWS CloudTrail

Vijin Boricha
09 May 2018
8 min read
AWS provides a wide variety of tools and managed services which allow you to safeguard your applications running on the cloud, such as AWS WAF and AWS Shield. But this, however, just forms one important piece of a much larger jigsaw puzzle! What about compliance monitoring, risk auditing, and overall governance of your environments? How do you effectively analyze events occurring in your environment and mitigate against the same? Well, luckily for us, AWS has the answer to our problems in the form of AWS CloudTrail. In today's post, we will explore AWS CloudTrail and learn how to create our own CloudTrail trail. [box type="shadow" align="" class="" width=""]This tutorial is an excerpt from the book AWS Administration - The Definitive Guide - Second Edition, written by Yohan Wadia.  This book will help you create a highly secure, fault-tolerant, and scalable Cloud environment for your applications to run on.[/box] AWS CloudTrail provides you with the ability to log every single action taken by a user, service, role, or even API, from within your AWS account. Each action recorded is treated as an event which can then be analyzed for enhancing the security of your AWS environment. The following are some of the key benefits that you can obtain by enabling CloudTrail for your AWS accounts: In-depth visibility: Using CloudTrail, you can easily gain better insights into your account's usage by recording each user's activities, such as which user initiated a new resource creation, from which IP address was this request initiated, which resources were created and at what time, and much more! Easier compliance monitoring: With CloudTrail, you can easily record and log events occurring within your AWS account, whether they may originate from the Management Console, or the AWS CLI, or even from other AWS tools and services. The best thing about this is that you can integrate CloudTrail with another AWS service, such as Amazon CloudWatch, to alert and respond to out-of-compliance events. Security automations: Automating responses to security threats not only enables you to mitigate the potential threats faster, but also provides you with a mechanism to stop all further attacks. The same can be applied to AWS CloudTrail as well! With its easy integration with Amazon CloudWatch events, you can now create corresponding Lambda functions that trigger automatically each time a compliance is not met, all in a matter of seconds! CloudTrail's essential concepts and terminologies With these key points in mind, let's have a quick look at some of CloudTrail's essential concepts and terminologies: Events Events are the basic unit of measurement in CloudTrail. Essentially, an event is nothing more than a record of a particular activity either initiated by the AWS services, roles, or even an AWS user. These activities are all logged as API calls that can originate from the Management Console, the AWS SDK, or even the AWS CLI as well. By default, events are stored by CloudTrail with S3 buckets for a period of 7 days. You can view, search, and even download these events by leveraging the events history feature provided by CloudTrail. Trails Trails are essentially the delivery mechanism, using which events are dumped to S3 buckets. You can use these trails to log specific events within specific buckets, as well as to filter events and encrypt the transmitted log files. By default, you can have a maximum of five trails created per AWS region, and this limit cannot by increased. CloudTrail Logs Once your CloudTrail starts capturing events, it sends these events to an S3 bucket in the form of a CloudTrail Log file. The log files are JSON text files that are compressed using the .gzip format. Each file can contain one or more events within itself. Here is a simple representation of what a CloudTrail Log looks like. In this case, the event was created when I tried to add an existing user by the name of Mike to an administrator group using the AWS Management Console: {"Records": [{ "eventVersion": "1.0", "userIdentity": { "type": "IAMUser", "principalId": "12345678", "arn": "arn:aws:iam::012345678910:user/yohan", "accountId": "012345678910", "accessKeyId": "AA34FG67GH89", "userName": "Alice", "sessionContext": {"attributes": { "mfaAuthenticated": "false", "creationDate": "2017-11-08T13:01:44Z" }} }, "eventTime": "2017-11-08T13:09:44Z", "eventSource": "iam.amazonaws.com", "eventName": "AddUserToGroup", "awsRegion": "us-east-1", "sourceIPAddress": "127.0.0.1", "userAgent": "AWSConsole", "requestParameters": { "userName": "Mike", "groupName": "administrator" }, "responseElements": null }]} You can view your own CloudTrail Log files by visiting the S3 bucket that you specify during the trail's creation. Each log file is named uniquely using the following format: AccountID_CloudTrail_RegionName_YYYYMMDDTHHmmZ_UniqueString.json.gz Where: AccountID: Your AWS account ID. RegionName: AWS region where the event was captured: us-east-1, and so on. YYYYMMDDTTHHmmz: Specifies the year, month, day, hour (24 hours), minutes, and seconds. The z indicates time in UTC. UniqueString: A randomly generated 16-character-long string that is simply used so that there is no overwriting of the log files. With the basics in mind, let's quickly have a look at how you can get started with CloudTrail for your own AWS environments! Creating your first CloudTrail Trail To get started, log in to your AWS Management Console and filter the CloudTrail service from the AWS services filter. On the CloudTrail dashboard, select the Create Trail option to get started: This will bring up the Create Trail wizard. Using this wizard, you can create a maximum of five-trails per region. Type a suitable name for the Trail into the Trail name field, to begin with. Next, you can either opt to Apply trail to all regions or only to the region out of which you are currently operating. Selecting all regions enables CloudTrail to record events from each region and dump the corresponding log files into an S3 bucket that you specify. Alternatively, selecting to record out of one region will only capture the events that occur from the region out of which you are currently operating. In my case, I have opted to enable the Trail only for the region I'm currently working out of. In the subsequent sections, we will learn how to change this value using the AWS CLI: Next, in the Management events section, select the type of events you wish to capture from your AWS environment. By default, CloudTrail records all management events that occur within your AWS account. These events can be API operations, such as events caused due to the invocation of an EC2 RunInstances or TerminateInstances operation, or even non-API based events, such as a user logging into the AWS Management Console, and so on. For this particular use case, I've opted to record All management events. Selecting the Read-only option will capture all the GET API operations, whereas the Write-only option will capture only the PUT API operations that occur within your AWS environment. Moving on, in the Storage location section, provide a suitable name for the S3 bucket that will store your CloudTrail Log files. This bucket will store all your CloudTrail Log files, irrespective of the regions the logs originated from. You can alternatively select an existing bucket from the S3 bucket selection field: Next, from the Advanced section, you can optionally configure a Log file prefix. By default, the logs will automatically get stored under a folder-like hierarchy that is usually of the form AWSLogs/ACCOUNT_ID/CloudTrail/REGION. You can also opt to Encrypt log files with the help of an AWS KMS key. Enabling this feature is highly recommended for production use. Selecting Yes in the Enable log file validation field enables you to verify the integrity of the delivered log files once they are delivered to the S3 bucket. Finally, you can even enable CloudTrail to send you notifications each time a new log file is delivered to your S3 bucket by selecting Yes against the Send SNS notification for every log file delivery option. This will provide you with an additional option to either select a predefined SNS topic or alternatively create a new one specifically for this particular CloudTrail. Once all the required fields are filled in, click on Create to continue. With this, you should be able to see the newly created Trail by selecting the Trails option from the CloudTrail dashboard's navigation pane, as shown in the following screenshot: We learned to create a new trail and enable notifications each time a new log file is delivered. If you are interested to learn more about CloudTrail Logs and AWS Config you may refer to this book  AWS Administration - The Definitive Guide - Second Edition. AWS SAM (AWS Serverless Application Model) is now open source! How to run Lambda functions on AWS Greengrass AWS Greengrass brings machine learning to the edge
Read more
  • 0
  • 0
  • 4399

article-image-amazon-simple-notification-service
Vijin Boricha
30 Apr 2018
5 min read
Save for later

Using Amazon Simple Notification Service (SNS) to create an SNS topic

Vijin Boricha
30 Apr 2018
5 min read
Simple Notification Service is a managed web service that you, as an end user, can leverage to send messages to various subscribing endpoints. SNS works in a publisher–subscriber or producer and consumer model, where producers create and send messages to a particular topic, which is in turn consumed by one or more subscribers over a supported set of protocols. At the time of writing this book, SNS supports HTTP, HTTPS, email, push notifications in the form of SMS, as well as AWS Lambda and Amazon SQS, as the preferred modes of subscribers. In today's tutorial, we will learn about Amazon Simple Notification Service (SNS) and how to create your very own simple SNS topics: SNS is a really simple and yet extremely useful service that you can use for a variety of purposes, the most common being pushing notifications or system alerts to cloud administrators whenever a particular event occurs. We have been using SNS throughout this book for this same purpose; however, there are many more features and use cases that SNS can be leveraged for. For example, you can use SNS to send out promotional emails or SMS to a large group of targeted audiences or even use it as a mobile push notification service where the messages are pushed directly to your Android or IOS applications. With this in mind, let's quickly go ahead and create a simple SNS topic of our own: To do so, first log in to your AWS Management Console and, from the Filter option, filter out SNS service. Alternatively, you can also access the SNS dashboard by selecting https://console.aws.amazon.com/sns. If this is your first time with SNS, simply select the Get Started option to begin. Here, at the SNS dashboard, you can start off by selecting the Create topic option, as shown in the following screenshot: Once selected, you will be prompted to provide a suitable Topic name and its corresponding Display name. Topics form the core functionality for SNS. You can use topics to send messages to a particular type of subscribing consumer. Remember, a single topic can be subscribed by more than one consumer. Once you have typed in the required fields, select the Create topic option to complete the process. That's it! Simple, isn't it? Having created your topic, you can now go ahead and associate it with one or more subscribers. To do so, first we need to create one or more subscriptions. Select the Create subscription option provided under the newly created topic, as shown in the following screenshot: Here, in the Create subscription dialog box, select a suitable Protocol that will subscribe to the newly created topic. In this case, I've selected Email as the Protocol. Next, provide a valid email address in the subsequent Endpoint field. The Endpoint field will vary based on the selected protocol. Once completed, click on the Create subscription button to complete the process. With the subscription created, you will now have to validate the subscription. This can be performed by launching your email application and selecting the Confirm subscription link in the mail that you would have received. Once the subscription is confirmed, you will be redirected to a confirmation page where you can view the subscribed topic's name as well as the subscription ID, as shown in the following screenshot: You can use the same process to create and assign multiple subscribers to the same topic. For example, select the Create subscription option, as performed earlier, and from the Protocol drop-down list, select SMS as the new protocol. Next, provide a valid phone number in the subsequent Endpoint field. The number can be prefixed by your country code, as shown in the following screenshot. Once completed, click on the Create subscription button to complete the process: With the subscriptions created successfully, you can now test the two by publishing a message to your topic. To do so, select the Publish to topic option from your topics page. Once a message is published here, SNS will attempt to deliver that message to each of its subscribing endpoints; in this case, to the email address as well as the phone number. Type in a suitable Subject name followed by the actual message that you wish to send. Note that if your character count exceeds 160 for an SMS, SNS will automatically send another SMS with the remainder of the character count. You can optionally switch the Message format between Raw and JSON to match your requirements. Once completed, select Publish Message. Check your email application once more for the published message. You should receive an mail, as shown in the following screenshot: Similarly, you can create and associate one or more such subscriptions to each of the topics that you create. We learned about creating simple Amazon SNS topics. You read an excerpt from the book AWS Administration - The Definitive Guide - Second Edition written by Yohan Wadia.  This book will help you build a highly secure, fault-tolerant, and scalable Cloud environment. Getting to know Amazon Web Services AWS IoT Analytics: The easiest way to run analytics on IoT data How to set up a Deep Learning System on Amazon Web Services    
Read more
  • 0
  • 0
  • 4448
Unlock access to the largest independent learning library in Tech for FREE!
Get unlimited access to 7500+ expert-authored eBooks and video courses covering every tech area you can think of.
Renews at $19.99/month. Cancel anytime
article-image-run-lambda-functions-on-aws-greengrass
Vijin Boricha
27 Apr 2018
7 min read
Save for later

How to run Lambda functions on AWS Greengrass

Vijin Boricha
27 Apr 2018
7 min read
AWS Greengrass is a form of edge computing service that extends the cloud's functionality to your IoT devices by allowing data collection and analysis closer to its point of origin. This is accomplished by executing AWS Lambda functions locally on the IoT device itself, while still using the cloud for management and analytics. Today, we will learn how to leverage AWS Greengrass to run simple lambda functions on an IoT device. How does this help a business? Well to start with, using AWS Greengrass you are now able to respond to locally generated events in near real time. With Greengrass, you can program your IoT devices to locally process and filter data and only transmit the important chunks back to AWS for analysis. This also has a direct impact on the costs as well as the amount of data transmitted back to the cloud. Here are the core components of AWS Greengrass: Greengrass Core (GGC) software: The Greengrass Core software is a packaged module that consists of a runtime to allow executions of Lambda functions, locally. It also contains an internal message broker and a deployment agent that periodically notifies the AWS Greengrass service about the device's configuration, state, available updates, and so on. The software also ensures that the connection between the device and the IoT service is secure with the help of keys and certificates. Greengrass groups: A Greengrass group is a collection of Greengrass Core settings and definitions that are used to manage one or more Greengrass-backed IoT devices. The groups internally comprise a few other components, namely: Greengrass group definition: A collection of information about your Greengrass group Device definition: A collection of IoT devices that are a part of a Greengrass group Greengrass group settings: Contains connection as well as configuration information along with the necessary IAM Roles required for interacting with other AWS services Greengrass Core: The IoT device itself Lambda functions: A list of Lambda functions that can be deployed to the Greengrass Core. Subscriptions: A collection of a message source, a message target and an MQTT topic to transmit the messages. The source or targets can be either the IoT service, a Lambda function or even the IoT device itself. Greengrass Core SDK: Greengrass also provides an SDK which you can use to write and run Lambda functions on Greengrass Core devices. The SDK currently supports Java 8, Python 2.7, and Node.js 6.10. With this key information in mind, let's go ahead and deploy our very own Greengrass Core on an IoT device. Running Lambda functions on AWS Greengrass With the Greengrass Core software up and running on your IoT device, we can now go ahead and run a simple Lambda function on it! For this particular section, we will be leveraging an AWS Lambda blueprint that prints a simple Hello World message: To get started, first, we will need to create our Lambda function. From the AWS Management Console, filter out the Lambda service using the Filter option or alternatively, select this URL: https://console.aws.amazon.com/lambda/home. Ensure that the Lambda function is launched from the same region as that of the AWS Greengrass. In this case, we are using the US-East-1 (N. Virginia) region. On the AWS Lambda console landing page, select the Create function option to get started. Since we are going to be leveraging an existing function blueprint for this use case, select the Blueprints option provided on the Create function page. Use the filter to find a blueprint with the name greengrass-hello-world. There are two templates present to date that match this name, one function is based on Python while the other is based on Node.js. For this particular section, select the greengrass-hello-world Python function and click on Configure to proceed. Fill out the required details for the new function, such as a Name followed by a valid Role. For this section, go ahead and select the Create new role from template option. Provide a suitable Role name and finally, from the Policy templates drop-down list, select the AWS IoT Button Permissions role. Once completed, click on Create function to complete the function's creation process. But before you move on to associating this function with your AWS Greengrass, you will also need to create a new version out of this function. Select the Publish new version option from the Actions tab. Provide a suitable Version description text and click on Publish once done. Your function is now ready for AWS Greengrass. Now, head back to the AWS IoT dashboard and select the newly deployed Greengrass group from the Groups option present on the navigation pane. From the Greengrass group page, select the Lambdas option from the navigation pane followed by the Add Lambda option, as shown in the following screenshot: On the Add a Lambda to your Greengrass group, you can choose to either Create a new Lambda function or Use an existing Lambda function as well. Since we have already created our function, select the Use existing function option. In the next page, select your Greengrass Lambda function and click Next to proceed. Finally, select the version of the deployed function and click on Finish once done. To finish things, we will need to create a new subscription between the Lambda function (source) and the AWS IoT service (destination). Select the Subscriptions option from the same Greengrass group page, as shown. Click on Add Subscription to proceed: On the Select your source and target page, select the newly deployed Lambda function as the source, followed by the IoT cloud as the target. Click on Next once done. You can provide an Optional topic filter as well, to filter messages published on the messaging queue. In this case, we have provided a simple hello/world as the filter for this scenario. Click on Finish once done to complete the subscription configuration. With all the pieces in place, it's now time to deploy our Lambda function over to the Greengrass Core. To do so, select the Deployments option and from the Actions drop-down list, select the Deploy option, as shown in the following screenshot: The deployment takes a few seconds to complete. Once done, verify the status of the deployment by viewing the Status column. The Status should show Successfully completed. With the function now deployed, test the setup by using the MQTT client provided by AWS IoT, as done before. Remember to enter the same hello/world topic name in the subscription topic field and click on Publish to topic once done. If all goes well, you should receive a custom Hello World message from the Greengrass Core as depicted in the following screenshot: This was just a high-level view of what you can achieve with Greengrass and Lambda. You can leverage Lambda for performing all kinds of preprocessing on data on your IoT device itself, thus saving a tremendous amount of time, as well as costs. With this, we come to the end of this post. Stay tuned for our next post where we will look at ways to effectively monitor IoT devices. We leveraged AWS Greengrass and Lambda to develop a cost-effective and speedy solution. You read an excerpt from the book AWS Administration - The Definitive Guide - Second Edition written by Yohan Wadia.  Whether you are a seasoned system admin or a rookie, this book will help you learn all the skills you need to work with the AWS cloud.  
Read more
  • 0
  • 0
  • 6890

article-image-creating-deploying-amazon-redshift-cluster
Vijin Boricha
26 Apr 2018
7 min read
Save for later

Creating and deploying an Amazon Redshift cluster

Vijin Boricha
26 Apr 2018
7 min read
Amazon Redshift is one of the database as a service (DBaaS) offerings from AWS that provides a massively scalable data warehouse as a managed service, at significantly lower costs. The data warehouse is based on the open source PostgreSQL database technology. However, not all features offered in PostgreSQL are present in Amazon Redshift. Today, we will learn about Amazon Redshift and perform a few steps to create a fully functioning Amazon Redshift cluster. We will also take a look at some of the essential concepts and terminologies that you ought to keep in mind when working with Amazon Redshift: Clusters: Just like Amazon EMR, Amazon Redshift also relies on the concept of clusters. Clusters here are logical containers containing one or more instances or compute nodes and one leader node that is responsible for the cluster's overall management. Here's a brief look at what each node provides: Leader node: The leader node is a single node present in a cluster that is responsible for orchestrating and executing various database operations, as well as facilitating communication between the database and associate client programs. Compute node: Compute nodes are responsible for executing the code provided by the leader node. Once executed, the compute nodes share the results back to the leader node for aggregation. Amazon Redshift supports two types of compute nodes: dense storage nodes and dense compute nodes. The dense storage nodes provide standard hard disk drives for creating large data warehouses; whereas, the dense compute nodes provide higher performance SSDs. You can start off by using a single node that provides 160 GB of storage and scale up to petabytes by leveraging one or more 16 TB capacity instances as well. Node slices: Each compute node is partitioned into one or more smaller chunks or slices by the leader node, based on the cluster's initial size. Each slice contains a portion of the compute nodes memory, CPU and disk resource, and uses these resources to process certain workloads that are assigned to it. The assignment of workloads is again performed by the leader node. Databases: As mentioned earlier, Amazon Redshift provides a scalable database that you can leverage for a data warehouse, as well as analytical purposes. With each cluster that you spin in Redshift, you can create one or more associated databases with it. The database is based on the open source relational database PostgreSQL (v8.0.2) and thus, can be used in conjunction with other RDBMS tools and functionalities. Applications and clients can communicate with the database using standard PostgreSQL JDBC and ODBC drivers. Here is a representational image of a working data warehouse cluster powered by Amazon Redshift: With this basic information in mind, let's look at some simple and easy to follow steps using which you can set up and get started with your Amazon Redshift cluster. Getting started with Amazon Redshift In this section, we will be looking at a few simple steps to create a fully functioning Amazon Redshift cluster that is up and running in a matter of minutes: First up, we have a few prerequisite steps that need to be completed before we begin with the actual set up of the Redshift cluster. From the AWS Management Console, use the Filter option to filter out IAM. Alternatively, you can also launch the IAM dashboard by selecting this URL: https://console.aws.amazon.com/iam/. Once logged in, we need to create and assign a role that will grant our Redshift cluster read-only access to Amazon S3 buckets. This role will come in handy later on in this chapter when we load some sample data on an Amazon S3 bucket and use Amazon Redshift's COPY command to copy the data locally into the Redshift cluster for processing. To create the custom role, select the Role option from the IAM dashboards' navigation pane. On the Roles page, select the Create role option. This will bring up a simple wizard using which we will create and associate the required permissions to our role. Select the Redshift option from under the AWS Service group section and opt for the Redshift - Customizable option provided under the Select your use case field. Click Next to proceed with the set up. On the Attach permissions policies page, filter and select the AmazonS3ReadOnlyAccess permission. Once done, select Next: Review. In the final Review page, type in a suitable name for the role and select the Create Role option to complete the process. Make a note of the role's ARN as we will be requiring this in the later steps. Here is snippet of the role policy for your reference: { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:Get*", "s3:List*" ], "Resource": "*" } ] } With the role created, we can now move on to creating the Redshift cluster. To do so, log in to the AWS Management Console and use the Filter option to filter out Amazon Redshift. Alternatively, you can also launch the Redshift dashboard by selecting this URL: https://console.aws.amazon.com/redshift/. Select Launch Cluster to get started with the process. Next, on the CLUSTER DETAILS page, fill in the required information pertaining to your cluster as mentioned in the following list: Cluster identifier: A suitable name for your new Redshift cluster. Note that this name only supports lowercase strings. Database name: A suitable name for your Redshift database. You can always create more databases within a single Redshift cluster at a later stage. By default, a database named dev is created if no value is provided: Database port: The port number on which the database will accept connections. By default, the value is set to 5439, however you can change this value based on your security requirements. Master user name: Provide a suitable username for accessing the database. Master user password: Type in a strong password with at least one uppercase character, one lowercase character and one numeric value. Confirm the password by retyping it in the Confirm password field. Once completed, hit Continue to move on to the next step of the wizard. On the NODE CONFIGURATION page, select the appropriate Node type for your cluster, as well as the Cluster type based on your functional requirements. Since this particular cluster setup is for demonstration purposes, I've opted to select the dc2.large as the Node type and a Single Node deployment with 1 compute node. Click Continue to move on the next page once done. It is important to note here that the cluster that you are about to launch will be live and not running in a sandbox-like environment. As a result, you will incur the standard Amazon Redshift usage fees for the cluster until you delete it. You can read more about Redshift's pricing at: https://aws.amazon.com/redshift/pricing/. In the ADDITIONAL CONFIGURATION page, you can configure add-on settings, such as encryption enablement, selecting the default VPC for your cluster, whether or not the cluster should have direct internet access, as well as any preferences for a particular Availability Zone out of which the cluster should operate. Most of these settings do not require any changes at the moment and can be left to their default values. The only changes required on this page is associating the previously created IAM role with the cluster. To do so, from the Available Roles drop-down list, select the custom Redshift role that we created in our prerequisite section. Once completed, click on Continue. Review the settings and changes on the Review page and select the Launch Cluster option when completed. The cluster takes a few minutes to spin up depending on whether or not you have opted for a single instance deployment or multiple instances. Once completed, you should see your cluster listed on the Clusters page, as shown in the following screenshot. Ensure that the status of your cluster is shown as healthy under the DB Health column. You can additionally make a note of the cluster's endpoint as well, for accessing it programmatically: With the cluster all set up, the next thing to do is connect to the same. This Amazon Redshift tutorial has been taken from AWS Administration - The Definitive Guide - Second Edition. Read More Amazon S3 Security access and policies How to run Lambda functions on AWS Greengrass AWS Fargate makes Container infrastructure management a piece of cake
Read more
  • 0
  • 0
  • 5220

article-image-erasure-coding-cold-storage
Packt
07 Jun 2017
20 min read
Save for later

Erasure coding for cold storage

Packt
07 Jun 2017
20 min read
In this article by Nick Frisk, author of the book Mastering Ceph, we will get acquainted with erasure coding. Ceph's default replication level provides excellent protection against data loss by storing three copies of your data on different OSD's. The chance of losing all three disks that contain the same objects within the period that it takes Ceph to rebuild from a failed disk, is verging on the extreme edge of probability. However, storing 3 copies of data vastly increases both the purchase cost of the hardware but also associated operational costs such as power and cooling. Furthermore, storing copies also means that for every client write, the backend storage must write three times the amount of data. In some scenarios, either of these drawbacks may mean that Ceph is not a viable option. Erasure codes are designed to offer a solution. Much like how RAID 5 and 6 offer increased usable storage capacity over RAID1, erasure coding allows Ceph to provide more usable storage from the same raw capacity. However also like the parity based RAID levels, erasure coding brings its own set of disadvantages. (For more resources related to this topic, see here.) In this article you will learn: What is erasure coding and how does it work Details around Ceph's implementation of erasure coding How to create and tune an erasure coded RADOS pool A look into the future features of erasure coding with Ceph Kraken release What is erasure coding Erasure coding allows Ceph to achieve either greater usable storage capacity or increase resilience to disk failure for the same number of disks versus the standard replica method. Erasure coding achieves this by splitting up the object into a number of parts and then also calculating a type of Cyclic Redundancy Check, the Erasure code, and then storing the results in one or more extra parts. Each part is then stored on a separate OSD. These parts are referred to as k and m chunks, where k refers to the number of data shards and m refers to the number of erasure code shards. As in RAID, these can often be expressed in the form k+m or 4+2 for example. In the event of an OSD failure which contains an objects shard which isone of the calculated erasure codes, data is read from the remaining OSD's that store data with no impact. However, in the event of an OSD failure which contains the data shards of an object, Ceph can use the erasure codes to mathematically recreate the data from a combination of the remaining data and erasure code shards. k+m The more erasure code shards you have, the more OSD failure's you can tolerate and still successfully read data. Likewise the ratio of k to m shards each object is split into, has a direct effect on the percentage of raw storage that is required for each object. A 3+1 configuration will give you 75% usable capacity, but only allows for a single OSD failure and so would not be recommended. In comparison a three way replica pool, only gives you 33% usable capacity. 4+2 configurations would give you 66% usable capacity and allows for 2 OSD failures. This is probably a good configuration for most people to use. At the other end of the scale a 18+2 would give you 90% usable capacity and still allows for 2 OSD failures. On the surface this sounds like an ideal option, but the greater total number of shards comes at a cost. The higher the number of total shards has a negative impact on performance and also an increased CPU demand. The same 4MB object that would be stored as a whole single object in a replicated pool, is now split into 20 x 200KB chunks, which have to be tracked and written to 20 different OSD's. Spinning disks will exhibit faster bandwidth, measured in MB/s with larger IO sizes, but bandwidth drastically tails off at smaller IO sizes. These smaller shards will generate a large amount of small IO and cause additional load on some clusters. Also its important not to forget that these shards need to be spread across different hosts according to the CRUSH map rules, no shard belonging to the same object can be stored on the same host as another shard from the same object. Some clusters may not have a sufficient number hosts to satisfy this requirement. Reading back from these high chunk pools is also a problem. Unlike in a replica pool where Ceph can read just the requested data from any offset in an object, in an Erasure pool, all shards from all OSD's have to be read before the read request can be satisfied. In the 18+2 example this can massively amplify the amount of required disk read ops and average latency will increase as a result. This behavior is a side effect which tends to only cause a performance impact with pools that use large number of shards. A 4+2 configuration in some instances will get a performance gain compared to a replica pool, from the result of splitting an object into shards.As the data is effectively striped over a number of OSD's, each OSD is having to write less data and there is no secondary and tertiary replica's to write. How does erasure coding work in Ceph As with Replication, Ceph has a concept of a primary OSD, which also exists when using erasure coded pools. The primary OSD has the responsibility of communicating with the client, calculating the erasure shards and sending them out to the remaining OSD's in the Placement Group (PG) set. This is illustrated in the diagram below: If an OSD in the set is down, the primary OSD, can use the remaining data and erasure shards to reconstruct the data, before sending it back to the client. During read operations the primary OSD requests all OSD's in the PG set to send their shards. The primary OSD uses data from the data shards to construct the requested data, the erasure shards are discarded. There is a fast read option that can be enabled on erasure pools, which allows the primary OSD to reconstruct the data from erasure shards if they return quicker than data shards. This can help to lower average latency at the cost of slightly higher CPU usage. The diagram below shows how Ceph reads from an erasure coded pool: The next diagram shows how Ceph reads from an erasure pool, when one of the data shards is unavailable. Data is reconstructed by reversing the erasure algorithm using the remaining data and erasure shards. Algorithms and profiles There are a number of different Erasure plugins you can use to create your erasure coded pool. Jerasure The default erasure plugin in Ceph is the Jerasure plugin, which is a highly optimized open source erasure coding library. The library has a number of different techniques that can be used to calculate the erasure codes. The default is Reed Solomon and provides good performance on modern processors which can accelerate the instructions that the technique uses. Cauchy is another technique in the library, it is a good alternative to Reed Solomon and tends to perform slightly better. As always benchmarks should be conducted before storing any production data on an erasure coded pool to identify which technique best suits your workload. There are also a number of other techniques that can be used, which all have a fixed number of m shards. If you are intending on only having 2 m shards, then they can be a good candidate, as there fixed size means that optimization's are possible lending to increased performance. In general the jerasure profile should be prefer in most cases unless another profile has a major advantage, as it offers well balanced performance and is well tested. ISA The ISA library is designed to work with Intel processors and offers enhanced performance. It too supports both Reed Solomon and Cauchy techniques. LRC One of the disadvantages of using erasure coding in a distributed storage system is that recovery can be very intensive on networking between hosts. As each shard is stored on a separate host, recovery operations require multiple hosts to participate in the process. When the crush topology spans multiple racks, this can put pressure on the inter rack networking links. The LRC erasure plugin, which stands for Local Recovery Codes, adds an additional parity shard which is local to each OSD node. This allows recovery operations to remain local to the node where a OSD has failed and remove the need for nodes to receive data from all other remaining shard holding nodes. However the addition of these local recovery codes does impact the amount of usable storage for a given number of disks. In the event of multiple disk failures, the LRC plugin has to resort to using global recovery as would happen with the jerasure plugin. SHingled Erasure Coding The SHingled Erasure Coding (SHEC) profile is designed with similar goals to the LRC plugin, in that it reduces the networking requirements during recovery. However instead of creating extra parity shards on each node, SHEC shingles the shards across OSD's in an overlapping fashion. The shingle part of the plugin name represents the way the data distribution resembles shingled tiles on a roof of a house. By overlapping the parity shards across OSD's, the SHEC plugin reduces recovery resource requirements for both single and multiple disk failures. Where can I use erasure coding Since the Firefly release of Ceph in 2014, there has been the ability to create a RADOS pool using erasure coding. There is one major thing that you should be aware of, the erasure coding support in RADOS does not allow an object to be partially updated. You can write to an object in an erasure pool, read it back and even overwrite it whole, but you cannot update a partial section of it. This means that erasure coded pools can't be used for RBD and CephFS workloads and is limited to providing pure object storage either via the Rados Gateway or applications written to use librados. The solution at the time was to use the cache tiering ability which was released around the same time, to act as a layer above an erasure coded pools that RBD could be used. In theory this was a great idea, in practice, performance was extremely poor. Every time an object was required to be written to, the whole object first had to be promoted into the cache tier. This act of promotion probably also meant that another object somewhere in the cache pool was evicted. Finally the object now in the cache tier could be written to. This whole process of constantly reading and writing data between the two pools meant that performance was unacceptable unless a very high percentage of the data was idle. During the development cycle of the Kraken release, an initial implementation for support for direct overwrites on n erasure coded pool was introduced. As of the final Kraken release, support is marked as experimental and is expected to be marked as stable in the following release. Testing of this feature will be covered later in this article. Creating an erasure coded pool Let's bring our test cluster up again and switch into SU mode in Linux so we don't have to keep prepending sudo to the front of our commands Erasure coded pools are controlled by the use of erasure profiles, these control how many shards each object is broken up into including the split between data and erasure shards. The profiles also include configuration to determine what erasure code plugin is used to calculate the hashes. The following plugins are available to use <list of plugins> To see a list of the erasure profiles run # cephosd erasure-code-profile ls You can see there is a default profile in a fresh installation of Ceph. Lets see what configuration options it contains # cephosd erasure-code-profile get default The default specifies that it will use the jerasure plugin with the Reed Solomon error correcting codes and will split objects into 2 data shards and 1 erasure shard. This is almost perfect for our test cluster, however for the purpose of this exercise we will create a new profile. # cephosd erasure-code-profile set example_profile k=2 m=1 plugin=jerasure technique=reed_sol_van # cephosd erasure-code-profile ls You can see our new example_profile has been created. Now lets create our erasure coded pool with this profile: # cephosd pool create ecpool 128 128 erasure example_profile The above command instructs Ceph to create a new pool called ecpool with a 128 PG's. It should be an erasure coded pool and should use our "example_profile" we previously created. Lets create an object with a small text string inside it and the prove the data has been stored by reading it back: # echo "I am test data for a test object" | rados --pool ecpool put Test1 – # rados --pool ecpool get Test1 - That proves that the erasure coded pool is working, but it's hardly the most exciting of discoveries. Lets have a look to see if we can see what's happening at a lower level. First, find out what PG is holding the object we just created # cephosd map ecpoolTest1 The result of the above command tells us that the object is stored in PG 3.40 on OSD's1, 2 and 0. In this example Ceph cluster that's pretty obvious as we only have 3 OSD's, but in larger clusters that is a very useful piece of information. We can now look at the folder structure of the OSD's and see how the object has been split. The PG's will likely be different on your test cluster, so make sure the PG folder structure matches the output of the "cephosd map" command above. ls -l /var/lib/ceph/osd/ceph-2/current/1.40s0_head/ # ls -l /var/lib/ceph/osd/ceph-1/current/1.40s1_head/ # ls -l /var/lib/ceph/osd/ceph-0/current/1.40s2_head/                 total 4 Notice how the PG directory names have been appended with the shard number, replicated pools just have the PG number as their directory name. If you examine the contents of the object files, you will see our text string that we entered into the object when we created it. However due to the small size of the text string, Ceph has padded out the 2nd shard with null characters and the erasure shard hence will contain the same as the first. You can repeat this example with a new object containing larger amounts of text to see how Ceph splits the text into the shards and calculates the erasure code. Overwrites on erasure code pools with Kraken Introduced for the first time in the Kraken release of Cephas an experimental feature, was the ability to allow partial overwrites on erasure coded pools. Partial overwrite support allows RBD volumes to be created on erasure coded pools, making better use of raw capacity of the Ceph cluster. In parity RAID, where a write request doesn't span the entire stripe, a read modify write operation is required. This is needed as the modified data chunks will mean the parity chunk is now incorrect. The RAID controller has to read all the current chunks in the stripe, modify them in memory, calculate the new parity chunk and finally write this back out to the disk. Ceph is also required to perform this read modify write operation, however the distributed model of Ceph increases the complexity of this operation.When the primary OSD for a PG receives a write request that will partially overwrite an existing object, it first works out which shards will be not be fully modified by the request and contacts the relevant OSD's to request a copy of these shards. The primary OSD then combines these received shards with the new data and calculates the erasure shards. Finally the modified shards are sent out to the respective OSD's to be committed. This entire operation needs to conform the other consistency requirements Ceph enforces, this entails the use of temporary objects on the OSD, should a condition arise that Ceph needs to roll back a write operation. This partial overwrite operation, as can be expected, has a performance impact. In general the smaller the write IO's, the greater the apparent impact. The performance impact is a result of the IO path now being longer, requiring more disk IO's and extra network hops. However, it should be noted that due to the striping effect of erasure coded pools, in the scenario where full stripe writes occur, performance will normally exceed that of a replication based pool. This is simply down to there being less write amplification due to the effect of striping. If performance of an Erasure pool is not suitable, consider placing it behind a cache tier made up of a replicated pool. Despite partial overwrite support coming to erasure coded pools in Ceph, not every operation is supported. In order to store RBD data on an erasure coded pool, a replicated pool is still required to hold key metadata about the RBD. This configuration is enabled by using the –data-pool option with the rbd utility. Partial overwrite is also not recommended to be used with Filestore. Filestore lacks several features that partial overwrites on erasure coded pools uses, without these features extremely poor performance is experienced. Demonstration This feature requires the Kraken release or newer of Ceph. If you have deployed your test cluster with the Ansible and the configuration provided, you will be running Ceph Jewel release. The following steps show how to use Ansible to perform a rolling upgrade of your cluster to the Kraken release. We will also enable options to enable experimental options such as bluestore and support for partial overwrites on erasure coded pools. Edit your group_vars/ceph variable file and change the release version from Jewel to Kraken. Also add: ceph_conf_overrides: global: enable_experimental_unrecoverable_data_corrupting_features: "debug_white_box_testing_ec_overwrites bluestore" And to correct a small bug when using Ansible to deploy Ceph Kraken, add: debian_ceph_packages: - ceph - ceph-common - ceph-fuse To the bottom of the file run the following Ansible playbook: ansible-playbook -K infrastructure-playbooks/rolling_update.yml Ansible will prompt you to make sure that you want to carry out the upgrade, once you confirm by entering yes the upgrade process will begin. Once Ansible has finished, all the stages should be successful as shown below: Your cluster has now been upgraded to Kraken and can be confirmed by running ceph -v on one of yours VM's running Ceph. As a result of enabling the experimental options in the configuration file, every time you now run a Ceph command, you will be presented with the following warning. This is designed as a safety warning to stop you running these options in a live environment, as they may cause irreversible data loss. As we are doing this on a test cluster, that is fine to ignore, but should be a stark warning not to run this anywhere near live data. The next command that is required to be run is to enable the experimental flag which allows partial overwrites on erasure coded pools. DO NOT RUN THIS ON PRODUCTION CLUSTERS cephosd pool get ecpooldebug_white_box_testing_ec_overwrites true Double check you still have your erasure pool called ecpool and the default RBD pool # cephosdlspools 0 rbd,1ecpool, And now create the rbd. Notice that the actual RBD header object still has to live on a replica pool, but by providing an additional parameter we can tell Ceph to store data for this RBD on an erasure coded pool. rbd create Test_On_EC --data-pool=ecpool --size=1G The command should return without error and you now have an erasure coded backed RBD image. You should now be able to use this image with any librbd application. Note: Partial overwrites on Erasure pools require Bluestore to operate efficiently. Whilst Filestore will work, performance will be extremely poor. Troubleshooting the 2147483647 error An example of this error is shown below when running the ceph health detail command. If you see 2147483647 listed as one of the OSD's for an erasure coded pool, this normally means that CRUSH was unable to find a sufficient number of OSD's to complete the PG peering process. This is normally due to the number of k+m shards being larger than the number of hosts in the CRUSH topology. However, in some cases this error can still occur even when the number of hosts is equal or greater to the number of shards. In this scenario it's important to understand how CRUSH picks OSD's as candidates for data placement. When CRUSH is used to find a candidate OSD for a PG, it applies the crushmap to find an appropriate location in the crush topology. If the result comes back as the same as a previous selected OSD, Ceph will retry to generate another mapping by passing slightly different values into the crush algorithm. In some cases if there is a similar number of hosts to the number of erasure shards, CRUSH may run out of attempts before it can suitably find correct OSD mappings for all the shards. Newer versions of Ceph has mostly fixed these problems by increasing the CRUSH tunable choose_total_tries. Reproducing the problem In order to aid understanding of the problem in more detail, the following steps will demonstrate how to create an erasure coded profile that will require more shards than our 3 node cluster can support. Firstly, like earlier in the articlecreate a new erasure profile, but modify the k/m parameters to be k=3 m=1: $ cephosd erasure-code-profile set broken_profile k=3 m=1 plugin=jerasure technique=reed_sol_van And now create a pool with it: $ cephosd pool create broken_ecpool 128 128 erasure broken_profile If we look at the output from ceph -s, we will see that the PG's for this new pool are stuck in the creating state. The output of ceph health detail, shows the reason why and we see the 2147483647 error. If you encounter this error and it is a result of your erasure profile being larger than your number of hosts or racks, depending on how you have designed your crushmap. Then the only real solution is to either drop the number of shards, or increase number of hosts. Summary In this article you have learnt what erasure coding is and how it is implemented in Ceph. You should also have an understanding of the different configuration options possible when creating erasure coded pools and their suitability for different types of scenarios and workloads. Resources for Article: Further resources on this subject: Ceph Instant Deployment [article] Working with Ceph Block Device [article] GNU Octave: Data Analysis Examples [article]
Read more
  • 0
  • 0
  • 7062

article-image-microsoft-azure-stack-architecture
Packt
07 Jun 2017
13 min read
Save for later

The Microsoft Azure Stack Architecture

Packt
07 Jun 2017
13 min read
In this article by Markus Klein and Susan Roesner, authors of the book Azure Stack for Datacenters, we will help you to plan, build, run, and develop your own Azure-based datacenter running Azure Stack technology. The goal is that the technology in your datacenter will be a 100 percent consistent using Azure, which provides flexibility and elasticity to your IT infrastructure. We will learn about: Cloud basics The Microsoft Azure Stack Core management services Using Azure Stack Migrating services to Azure Stack (For more resources related to this topic, see here.) Cloud as the new IT infrastructure Regarding the technical requirements of today's IT, the cloud is always a part of the general IT strategy. It does not depend upon the region in which the company is working in, nor does it depend upon the part of the economy—99.9 percent of all companies have cloud technology already in their environment. The good question for a lot of CIOs is general: "To what extent do we allow cloud services, and what does that mean to our infrastructure?" So it's a matter of compliance, allowance, and willingness. The top 10most important questions for a CIO to prepare for the cloud are as follows: Are we allowed to save our data in the cloud? What classification of data can be saved in the cloud? How flexible are we regarding the cloud? Do we have the knowledge to work with cloud technology? How does our current IT setup and infrastructure fit into the cloud's requirements? Is our current infrastructure already prepared for the cloud? Are we already working with a cloud-ready infrastructure? Is our Internet bandwidth good enough? What does the cloud mean to my employees? Which technology should we choose? Cloud Terminology The definition of the term "cloud" is not simple, but we need to differentiate between the following: Private cloud: This is a highly dynamic IT infrastructure based on virtualization technology that is flexible and scalable. The resources are saved in a privately owned datacenter either in your company or a service provider of your choice. Public cloud: This is a shared offering of IT infrastructure services that are provided via the Internet. Hybrid cloud: This is a mixture of a private and public cloud. Depending on compliance or other security regulations, the services that could be run in a public datacenter are already deployed there, but the services that need to be stored inside the company are running there. The goal is to run these services on the same technology to provide the agility, flexibility, and scalability to move services between public and private datacenters. In general, there are some big players within the cloud market (for example, Amazon Web Services, Google, Azure, and even Alibaba). If a company is quite Microsoft-minded from the infrastructure point of view, they should have a look at Microsoft Azure datacenters. Microsoft started in 2008 with their first datacenter, and today, they invest a billion dollars every month in Azure. As of today, there are about 34 official datacenters around the world that form Microsoft Azure, besides some that Microsoft does not talk about (for example, USGovernment Azure). There are some dedicated datacenters, such as the German Azure cloud, that do not have connectivity to Azure worldwide. Due to compliance requirements, these frontiers need to exist, but the technology of each Azure datacenter is the same although the services offered may vary. The following map gives an overview of the locations (so-called regions) in Azure as of today and provide an idea of which ones will be coming soon: The Microsoft cloud story When Microsoft started their public cloud, they decided that there must be a private cloud stack too, especially, to prepare their infrastructure to run in Azure sometime in the future. The first private cloud solution was the System Center suite, with System Center Orchestrator and Service Provider Foundation (SPF) and Service Manager as the self-service portal solution. Later on, Microsoft launched Windows Azure Pack for Windows Server. Today, Windows Azure Pack is available as a product focused on the private cloud and provides a self-service portal (the well-known old Azure Portal, code name red dog frontend), and it uses the System Center suite as its underlying technology: Microsoft Azure Stack In May2015, Microsoft formally announced a new solution that brings Azure to your datacenter. This solution was named Microsoft Azure Stack. To put it in one sentence: Azure Stack is the same technology with the same APIs and portal as Public Azure, but you could run it in your datacenter or in that of your service provider. With Azure Stack, System Center is completely gone because everything is the way it is in Azure now, and in Azure, there is no System Center at all. This is what the primary focus of this article is. The following diagram gives a current overview of the technical design of Azure Stack compared with Azure: The one and only difference between Azure Stack and Azure is the cloud infrastructure. In Azure, there are thousands of servers that are part of the solution; with Azure Stack, the number is slightly smaller. That's why there is the cloud-inspired infrastructure based on Windows Server, Hyper-V, and Azure technologies as the underlying technology stack. There is no System Center product in this stack anymore. This does not mean that it cannot be there (for example, SCOM for on-premise monitoring), but Azure Stack itself provides all functionality with the solution itself. For stability and functionality, Microsoft decided to provide Azure Stack as a so-called integrated system, so it will come to your door with the hardware stack included. The customer buys Azure Stack as a complete technology stack. At general availability(GA), the hardware OEMs are HPE, DellEMC, and Lenovo. In addition to this, there will be a one-host PoC deployment available for download that could be run as a proof of concept solution on every type of hardware, as soon as it meets the hardware requirements. Technical design Looking at the technical design a bit more in depth, there are some components that we need to dive deeper into: The general basis of Azure Stack is Windows Server 2016 technology, which builds the cloud-inspired infrastructure: Storage Spaces Direct (S2D) VXLAN Nano Server Azure Resource Manager (ARM) Storage Spaces Direct (S2D) Storage Spaces and Scale-Out File Server were technologies that came with Windows Server 2012. The lack of stability in the initial versions and the issues with the underlying hardware was a bad phase. The general concept was a shared storage setup using JBODs controlled from Windows Server 2012 Storage Spaces servers and a magic Scale-Out File Server cluster that acted as the single point of contact for storage: With Windows Server 2016, the design is quite different and the concept relies on a shared-nothing model, even with local attached storage: This is the storage design Azure Stack is coming up with as one of its main pillars. VXLANnetworking technology With Windows Server 2012, Microsoft introduced Software-Defined Networking(SDN)and the NVGRE technology. Hyper-V Network Virtualization supports Network Virtualization using Generic Routing Encapsulation (NVGRE) as the mechanism to virtualize IP addresses. In NVGRE, the virtual machine's packet is encapsulated inside another packet: Hyper-V Network Virtualization supports NVGRE as the mechanism to virtualize IP addresses. In NVGRE, the virtual machine's packet is encapsulated inside another packet. VXLAN comes as the new SDNv2protocol, is RFC compliant, and is supported by most network hardware vendors by default. The Virtual eXtensible Local Area Network (VXLAN) RFC 7348 protocol has been widely adopted in the marketplace, with support from vendors such as Cisco, Brocade, Arista, Dell, and HP. The VXLAN protocol uses UDP as the transport: Nano Server Nano Server offers a minimal-footprint headless version of Windows Server 2016. It completely excludes the graphical user interface, which means that it is quite small, headless, and easy to handle regarding updates and security fixes, but it doesn't provide the GUI expected by customers of Windows Server. Azure Resource Manager (ARM) The “magical” Azure Resource Manager is a 1-1 bit share with ARM from Azure, so it has the same update frequency and features that are available in Azure, too. ARM is a consistent management layer that saves resources, dependencies, inputs, and outputs as an idempotent deployment as a JSON file called an ARM template. This template defines the tastes of a deployment, whether it be VMs, databases, websites, or anything else. The goal is that once a template is designed, it can be run on each Azure-based cloud platform, including Azure Stack. ARM provides cloud consistency with the finest granularity, and the only difference between the clouds is the region the template is being deployed to and the corresponding REST endpoints. ARM not only provides a template for a logical combination of resources within Azure, it manages subscriptions and role-based access control(RBAC) and defines the gallery, metric, and usage data, too. This means quite simply that everything that needs to be done with Azure resources should be done with ARM. Not only does Azure Resource Manager design one virtual machine, it is responsible for setting up one to a bunch of resources that fit together for a specific service. Even ARM templates can be nested; this means they can depend on each other. When working with ARM, you should know the following vocabulary: Resource: Are source is a manageable item available in Azure Resource group: A resource group is the container of resources that fit together within a service Resource provider: A resource provider is a service that can be consumed within Azure Resource manager template: A resource manager template is the definition of a specific service Declarative syntax: Declarative syntax means that the template does not define the way to set up a resource; it just defines how the result and the resource itself has the feature to set up and configure itself to fulfill the syntax to create your own ARM templates, you need to fulfill the following minimum requirements: A test editor of your choice Visual Studio Community Edition Azure SDK Visual Studio Community Edition is available for free from the Internet. After setting these things up, you could start it and define your own templates: Setting up a simple blank template looks like this: There are different ways to get a template so that you can work on and modify it to fit your needs: Visual Studio templates Quick-start templates on GitHub Azure ARM templates You could export the ARM template directly from Azure Portal if the resource has been deployed: After clicking on View template, the following opens up: For further reading on ARM basics, the Getting started with Azure Resource Managerdocument is a good place to begin: http://aka.ms/GettingStartedWithARM. PowerShell Desired State Configuration We talked about ARM and ARM templates that define resources, but they are unable to design the waya VM looks inside, specify which software needs to be installed, and how the deployment should be done. This is why we need to have a look at VMextensions.VMextensions define what should be done after ARM deployment has finished. In general, the extension could be anything that's a script. The best practice is to use PowerShell and its add-on called Desired State Configuration (DSC). DSC defines—quite similarly to ARM—how the software needs to be installed and configured. The great concept is that it also monitors whether the desired state of a virtual machine is changing (for example, because an administrator uninstalls or reconfigures a machine). If it does, it makes sure within minutes whether the original state will be fulfilled again and rolls back the actions to the desired state: Migrating services to Azure Stack If you are running virtual machines today, you're already using a cloud-based technology, although we do not call it cloud today. Basically, this is the idea of a private cloud. If you are running Azure Pack today, you are quite near Azure Stack from the processes point of view but not the technology part. There is a solution called connectors for Azure Pack that lets you have one portal UI for both cloud solutions. This means that the customer can manage everything out of the Azure Stack Portal, although services run in Azure Pack as a legacy solution. Basically, there is no real migration path within Azure Stack. But the way to solve this is quite easy, because you could use every tool that you can use to migrate services to Azure. Azure website migration assistant The Azure website migration assistant will provide a high-level readiness assessment for existing websites. This report outlines sites that are ready to move and elements that may need changes, and it highlights unsupported features. If everything is prepared properly, the tool creates any website and associated database automatically and synchronizes the content. You can learn more about it at https://azure.microsoft.com/en-us/downloads/migration-assistant/: For virtual machines, there are two tools available: Virtual Machines Readiness Assessment Virtual Machines Optimization Assessment Virtual Machines Readiness Assessment The Virtual Machines Readiness Assessment tool will automatically inspect your environment and provide you with a checklist and detailed report on steps for migrating the environment to the cloud. The download location is https://azure.microsoft.com/en-us/downloads/vm-readiness-assessment/. If you run the tool, you will get an output like this: Virtual Machines Optimization Assessment The Virtual Machine Optimization Assessment tool will at first start with a questionnaire and ask several questions about your deployment. Then, it will create an automated data collection and analysis of your Azure VMs. It generates a custom report with tenprioritized recommendations across six focus areas. These areas are security and compliance, performance and scalability, and availability and business continuity. The download location ishttps://azure.microsoft.com/en-us/downloads/vm-optimization-assessment/. Summary Azure Stack provides a real Azure experience in your datacenter. The UI, administrative tools, and even third-party solutions should work properly. The design of Azure Stack is a very small instance of Azure with some technical design modifications, especially regarding the compute, storage, and network resource providers. These modifications give you a means to start small, think big, and deploy large when migrating services directly to public Azure sometime in the future, if needed. The most important tool for planning, describing, defining, and deploying Azure Stack services is Azure Resource Manager, just like in Azure. This provides you a way to create your services just once but deploy them many times. From the business perspective, this means you have better TCO and lower administrative costs. Resources for Article: Further resources on this subject: Deploying and Synchronizing Azure Active Directory [article] What is Azure API Management? [article] Installing and Configuring Windows Azure Pack [article]
Read more
  • 0
  • 0
  • 6029
article-image-about-certified-openstack-administrator-exam
Packt
15 Mar 2017
8 min read
Save for later

About the Certified OpenStack Administrator Exam

Packt
15 Mar 2017
8 min read
In this article by Matt Dorn, authors of the book Certified OpenStack Administrator Study Guide, we will learn and understand how we can pass the Certified OpenStack Administrator exam successfully! (For more resources related to this topic, see here.) Benefits of passing the exam Ask anyone about getting started in the IT world and they may suggest looking into industry-recognized technical certifications. IT certifications measure competency in a number of areas and are a great way to open doors to opportunities. While they certainly should not be the only determining factor in the hiring process, achieving them can be a measure of your competence and commitment to facing challenges. If you pass... Upon completion of a passing grade, you will receive your certificate. Laminate, frame, or pin it to your home office wall or work cubicle! It's proof that you have met all the requirements to become an official OpenStack Administrator. The certification is valid for three years from the pass date so don't forget to renew! The OpenStack Foundation has put together a great tool for helping employers verify the validity of COA certifications. Check out the Certified OpenStack Administrator verification tool. In addition to the certification, a COA badge will appear next to your name in the OpenStack Foundation's Member Directory: 7 steps to becoming a Certified OpenStack Administrator! The journey of a thousand miles begins with one step.                                                                       -Lao Tzu Let's begin by walking through some steps to become a Certified OpenStack Administrator! Step 1 – study! Practice! Practice! Practice! Use this article and the included OpenStack all-in-one virtual appliance as a resource as you begin your Certified OpenStack Administrator journey. If you still find yourself struggling with the concepts and objectives in this article, you can always refer to the official OpenStack documentation or even seek out a live training class at the OpenStack training marketplace. Step 2 – purchase! Once you feel that you're ready to conquer the exam, head to the Official Certified OpenStack Administrator homepage and click on the Get Started link. After signing in, you will be directed to checkout to purchase your exam. The OpenStack Foundation accepts all major credit cards and as of March 2017, costs $300.00 USD but is subject to change so keep an eye out on the website. You can also get a free retake within 12 months of the original exam purchase date if you do not pass on the first attempt. To encourage academia students to get their feet wet with OpenStack technologies, the OpenStack Foundation is offering the exam for $150.00 (50% off the retail price) with a valid student ID.  Check out https://www.openstack.org/coa/student/ for more info. Step 3 – COA portal page Once your order is processed, you will receive an email with access to the COA portal. Think of the portal as your personal COA website where you can download your exam receipt and keep track of your certification efforts. Once you take the exam, you can come back to the COA portal to check your exam status, exam score, and even download certificates and badges for displaying on business cards or websites! Step 4 – hardware compatibility check The COA exam can be taken from your personal laptop or desktop but you must ensure that your system meets the exam's minimum system requirements. A link on the COA portal page will present you with the compatibility check tool which will run a series of tests to ensure you meet the requirements.  It will also assist you in downloading a Chrome plugin for taking the exam. At this time, you must use the Chrome or Chromium browser and have access to reliable internet, a webcam, and microphone. Here is a current list of requirements: Step 5 – identification You must be at least 18 years old and have proper identification to take the exam! Any of the following pieces of identification are acceptable: Passport Government-issued driver's license or permit National identity card State or province-issued identity card Step 6 – schedule the exam I personally recommend scheduling your exam a few months ahead of time to give yourself a realistic goal. Click on the schedule exam link on the COA portal to be directed and automatically logged into the exam proctor partner website. Once logged into the site, type OpenStack Foundation in the search box and select the COA exam. You will then choose from available dates and times. The latest possible exam date you can schedule will be 30 days out from the current date. Once you have scheduled it, you can cancel or reschedule up to 24 hours before the start time of the exam. Step 7 – take the exam! Your day has arrived! You've used this article and have practiced day and night to master all of the covered objectives! It's finally time to take the exam! One of the most important factors determining your success on the exam is the location. You cannot be in a crowded place! This means no coffee shops, work desks, or football games! The testing location policy is very strict, so please consider taking the exam from home or perhaps a private room in the office. Log into the COA portal fifteen minutes before your scheduled exam time. You should now see a take exam link which will connect to the exam proctor partner website so you can connect to the testing environment. Once in the exam environment, an Exam Proctor chat window will appear and assist you with starting your exam. You must allow sharing of your entire operating system screen (this includes all applications), webcam, and microphone. It's time to begin! You have two and a half hours to complete all exam objectives. You're almost on your way to becoming a Certified OpenStack Administrator! About the exam environment The exam expects its test-takers to be proficient in interacting with OpenStack via the Horizon dashboard and command-line interface. Here is a visual representation of the exam console as outlined in the COA candidate handbook: The exam console is embedded into the browser. It is composed of two primary parts: the Content Panel and the Dashboard/Terminal Panel. The Content Panel is the section that displays the exam timer and objectives. As per the COA handbook, exam objectives can only be navigated linearly. You can use the Next and Back button to move to each objective. If you struggle with a question, move on! Hit the Next button and try the next objective. You can always come back and tackle it before time is up. The Dashboard/Terminal Panel gives you full access to an OpenStack environment. As of March 2017 and the official COA website, the exam is on the liberty version of OpenStack. The exam will be upgraded to Newton and will launch at the OpenStack Summit Boston in May 2017. The exam console terminal is embedded in a browser and you cannot Secure Copy (SCP) to it from your local system. Within the terminal environment, you are permitted to install a multiplexor such as screen, tmux, or byobu if you think these will assist you but are not necessary for successful completion of all objectives. You are not permitted to browse websites, e-mail, or notes during the exam but you are free to access the official OpenStack documentation webpages. This can be a major waste of time on the exam and shouldn't be necessary after working through the exam objectives in this article. You can also easily copy and paste from the objective window into the Horizon dashboard or terminal. The exam is scored automatically within 24 hours and you should receive the results via e-mail within 72 hours after exam completion. At this time, the results will be made available on the COA portal. Please review the professional code of conduct on the OpenStack Foundation certification handbook. The exam objectives Let's now take a look at the objectives you will be responsible for performing on the exam. As of March 2017, these are all the exam objectives published on the official COA website. These domains cover multiple core OpenStack services as well as general OpenStack troubleshooting. Together, all of these domains make up 100% of the exam. Because some of the objectives on the official COA requirements list overlap, this article utilizes its own convenient strategy to ensure you can fulfill all objectives within all content areas. Summary OpenStack is open source cloud software that provides an Infrastructure as a Service environment to enable its users to quickly deploy applications by creating virtual resources like virtual servers, networks, and block storage volumes. The IT industry's need for individuals with OpenStack skills is continuing to grow and one of the best ways to prove you have those skills is by taking the Certified OpenStack Administrator exam. Matt Dorn Resources for Article: Further resources on this subject: Deploying OpenStack – the DevOps Way [article] Introduction to Ansible [article] Introducing OpenStack Trove [article]
Read more
  • 0
  • 0
  • 3351

Packt
23 Feb 2017
12 min read
Save for later

How to start using AWS

Packt
23 Feb 2017
12 min read
In this article, by Lucas Chan and Rowan Udell, authors of the book, AWS Administration Cookbook, we will cover the following: Infrastructure as Code AWS CloudFormation (For more resources related to this topic, see here.) What is AWS? Amazon Web Services (AWS) is a public cloud provider. It provides infrastructure and platform services at a pay-per-use rate. This means you get on-demand access to resources that you used to have to buy outright. You can get access to enterprise-grade services while only paying for what you need, usually down to the hour. AWS prides itself on providing the primitives to developers so that they can build and scale the solutions that they require. How to create an AWS account In order to follow along, you will need an AWS account. Create an account here. by clicking on the Sign Up button and entering your details. Regions and Availability Zones on AWS A fundamental concept of AWS is that its services and the solutions built on top of them are architected for failure. This means that a failure of the underlying resources is a scenario actively planned for, rather than avoided until it cannot be ignored. Due to this, all the services and resources available are divided up in to geographically diverse regions. Using specific regions means you can provide services to your users that are optimized for speed and performance. Within a region, there are always multiple Availability Zones (a.k.a. AZ). Each AZ represents a geographically distinct—but still close—physical data center. AZs have their own facilities and power source, so an event that might take a single AZ offline is unlikely to affect the other AZs in the region. The smaller regions have at least two AZs, and the largest has five. At the time of writing this, the following regions are active: Code Name Availability Zones us-east-1 N. Virginia 5 us-east-2 Ohio 3 us-west-1 N. California 3 us-west-2 Oregon 3 ca-central-1 Canada 2 eu-west-1 Ireland 3 eu-west-2 London 2 eu-central-1 Frankfurt 2 ap-northeast-1 Tokyo 3 ap-northeast-2 Seoul 2 ap-southeast-1 Singapore 2 ap-southeast-2 Sydney 3 ap-south-1 Mumbai 2 sa-east-1 Sao Paulo 3 The AWS web console The web-based console is the first thing you will see after creating your AWS account, and you will often refer to it when viewing and confirming your configuration. The AWS web console The console provides an overview of all the services available as well as associated billing and cost information. Each service has its own section, and the information displayed depends on the service being viewed. As new features and services are released, the console will change and improve. Don't be surprised if you log in and things have changed from one day to the next. Keep in mind that the console always shows your resources by region. If you cannot see a resource that you created, make sure you have the right region selected. Choose the region closest to your physical location for the fastest response times. Note that not all regions have the same services available. The larger, older regions generally have the most services available. Some of the newer or smaller regions (that might be closest to you) might not have all services enabled yet. While services are continually being released to regions, you may have to use another region if you simply must use a newer service. The us-east-1 (a.k.a. North Virginia) region is special given its status as the first region. All services are available there, and new services are always released there. As you get more advanced with your usage of AWS, you will spend less time in the console and more time controlling your services programmatically via the AWS CLI tool and CloudFormation, which we will go into in more detail in the next few topics. CloudFormation templates CloudFormation is the Infrastructure as Code service from AWS. Where CloudFormation was not applicable, we have used the AWS CLI to make the process repeatable and automatable. Since the recipes are based on CloudFormation templates, you can easily combine different templates to achieve your desired outcomes. By editing the templates or joining them, you can create more useful and customized configurations with minimal effort. What is Infrastructure as Code? Infrastructure as Code (IaC) is the practice of managing infrastructure though code definitions. On an Infrastructure-as-a-Service (IaaS) platform such as AWS, IaC is needed to get the most utility and value. IaC differs primarily from traditional interactive methods of managing infrastructure because it is machine processable. This enables a number of benefits: Improved visibility of resources Higher levels of consistency between deployments and environments Easier troubleshooting of issues The ability to scale more with less effort Better control over costs On a less tangible level, all of these factors contribute to other improvements for your developers: you can now leverage tried-and-tested software development practices for your infrastructure and enable DevOps practices in your teams. Visibility As your infrastructure is represented in machine-readable files, you can treat it like you do your application code. You can take the best-practice approaches to software development and apply them to your infrastructure. This means you can store it in version control (for example, Git and SVN) just like you do your code, along with the benefits that it brings: All changes to infrastructure are recorded in commit history You can review changes before accepting/merging them You can easily compare different configurations You can pick and use specific point-in-time configurations Consistency Consistent configuration across your environments (for example, dev, test, and prod) means that you can more confidently deploy your infrastructure. When you know what configuration is in use, you can easily test changes in other environments due to a common baseline. IaC is not the same as just writing scripts for your infrastructure. Most tools and services will leverage higher-order languages and DSLs to allow you to focus on your higher-level requirements. It enables you to use advanced software development techniques, such as static analysis, automated testing, and optimization. Troubleshooting IaC makes replicating and troubleshooting issues easier: since you can duplicate your environments, you can accurately reproduce your production environment for testing purposes. In the past, test environments rarely had exactly the same infrastructure due to the prohibitive cost of hardware. Now that it can be created and destroyed on demand, you are able to duplicate your environments only when they are needed. You only need to pay for the time that they are running for, usually down to the hour. Once you have finished testing, simply turn your environments off and stop paying for them. Even better than troubleshooting is fixing issues before they cause errors. As you refine your IaC in multiple environments, you will gain confidence that is difficult to obtain without it. By the time you deploy your infrastructure in to production, you have done it multiple times already. Scale Configuring infrastructure by hand can be a tedious and error-prone process. By automating it, you remove the potential variability of a manual implementation: computers are good at boring, repetitive tasks, so use them for it! Once automated, the labor cost of provisioning more resources is effectively zero—you have already done the work. Whether you need to spin up one server or a thousand, it requires no additional work. From a practical perspective, resources in AWS are effectively unconstrained. If you are willing to pay for it, AWS will let you use it. Costs AWS have a vested (commercial) interest in making it as easy as possible for you to provision infrastructure. The benefit to you as the customer is that you can create and destroy these resources on demand. Obviously, destroying infrastructure on demand in a traditional, physical hardware environment is simply not possible. You would be hard-pressed to find a data center that will allow you to stop paying for servers and space simply because you are not currently using them. Another use case where on demand infrastructure can make large cost savings is your development environment. It only makes sense to have a development environment while you have developers to use it. When your developers go home at the end of the day, you can switch off your development environments so that you no longer pay for them. Before your developers come in in the morning, simply schedule their environments to be created. DevOps DevOps and IaC go hand in hand. The practice of storing your infrastructure (traditionally the concern of operations) as code (traditionally the concern of development) encourages a sharing of responsibilities that facilitates collaboration. Image courtesy: Wikipedia By automating the PACKAGE, RELEASE, and CONFIGURE activities in the software development life cycle (as pictured), you increase the speed of your releases while also increasing the confidence. Cloud-based IaC encourages architecture for failure: as your resources are virtualized, you must plan for the chance of physical (host) hardware failure, however unlikely. Being able to recreate your entire environment in minutes is the ultimate recovery solution. Unlike physical hardware, you can easily simulate and test failure in your software architecture by deleting key components—they are all virtual anyway! Server configuration Server-side examples of IaC are configuration-management tools such as Ansible, Chef, and Puppet. While important, these configuration-management tools are not specific to AWS, so we will not be covering them in detail here. IaC on AWS CloudFormation is the IaC service from AWS. Templates written in a specific format and language define the AWS resources that should be provisioned. CloudFormation is declarative and can not only provision resources, but also update them. We will go into CloudFormation in greater detail in the following topic. CloudFormation We'll use CloudFormation extensively throughout, so it's important that you have an understanding of what it is and how it fits in to the AWS ecosystem. There should easily be enough information here to get you started, but where necessary, we'll refer you to AWS' own documentation. What is CloudFormation? The CloudFormation service allows you to provision and manage a collection of AWS resources in an automated and repeatable fashion. In AWS terminology, these collections are referred to as stacks. Note however that a stack can be as large or as small as you like. It might consist of a single S3 bucket, or it might contain everything needed to host your three-tier web app. In this article, we'll show you how to define the resources to be included in your CloudFormation stack. We'll talk a bit more about the composition of these stacks and why and when it's preferable to divvy up resources between a number of stacks. Finally, we'll share a few of the tips and tricks we've learned over years of building countless CloudFormation stacks. Be warned: pretty much everyone incurs at least one or two flesh wounds along their journey with CloudFormation. It is all very much worth it, though. Why is CloudFormation important? By now, the benefits of automation should be starting to become apparent to you. But don't fall in to the trap of thinking CloudFormation will be useful only for large collections of resources. Even performing the simplest task of, say, creating an S3 bucket can get very repetitive if you need to do it in every region. We work with a lot of customers who have very tight controls and governance around their infrastructure, especially in the finance sector, and especially in the network layer (think VPCs, NACLs, and security groups). Being able to express one's cloud footprint in YAML (or JSON), store it in a source code repository, and funnel it through a high-visibility pipeline gives these customers confidence that their infrastructure changes are peer-reviewed and will work as expected in production. Discipline and commitment to IaC SDLC practices are of course a big factor in this, but CloudFormation helps bring us out of the era of following 20-page run-sheets for manual changes, navigating untracked or unexplained configuration drift, and unexpected downtime caused by fat fingers. The layer cake Now is a good time to start thinking about your AWS deployments in terms of layers. Your layers will sit atop one another, and you will have well-defined relationships between them. Here's a bottom-up example of how your layer cake might look: VPC Subnets, routes, and NACLs NAT gateways, VPN or bastion hosts, and associated security groups App stack 1: security groups, S3 buckets App stack 1: cross-zone RDS and read replica App stack 1: app and web server auto scaling groups and ELBs App stack 1: cloudfront and WAF config In this example, you may have many occurrences of the app stack layers inside your VPC, assuming you have enough IP addresses in your subnets! This is often the case with VPCs living inside development environments. So immediately, you have the benefit of multi-tenancy capability with application isolation. One advantage of this approach is that while you are developing your CloudFormation template, if you mess up the configuration of your app server, you don't have to wind back all the work CFN did on your behalf. You can just turf that particular layer (and the layers that depend on it) and restart from there. This is not the case if you have everything contained in a single template. We commonly work with customers for whom ownership and management of each layer in the cake reflects the structure of the technological divisions within a company. The traditional infrastructure, network, and cyber security folk are often really interested in creating a safe place for digital teams to deploy their apps, so they like to heavily govern the foundational layers of the cake. Conway's Law, coined by Melvin Conway, starts to come in to play here: "Any organization that designs a system will inevitably produce a design whose structure is a copy of the organization's communication structure." Finally, even if you are a single-person infrastructure coder working in a small team, you will benefit from this approach. For example, you'll find that it dramatically reduces your exposure to things such as AWS limits, timeouts, and circular dependencies.
Read more
  • 0
  • 0
  • 3916

article-image-deploying-and-synchronizing-azure-active-directory
Packt
08 Feb 2017
5 min read
Save for later

Deploying and Synchronizing Azure Active Directory

Packt
08 Feb 2017
5 min read
In this article by Jan-henrik Damaschke and Oliver Michalski, authors of the book Implementing Azure Solutions, we will learn about growing cloud services identity management and security as well as access policies within cloud environments and cloud services become more essential and important. The Microsoft central instance for this is Azure Active Directory. Every security policy or identity Microsoft provides for their cloud services is based on Azure Active Directory. Within this article you will learn the basics about Azure Active Directory, how you implement Azure AD and hybrid Azure Active Directory with connection to Active Directory Domain Services. We are going to explore the following topics: Azure Active Directory overview Azure Active Directory Subscription Options Azure Active Directory Deployment Azure Active Directory User and Subscription Management How to deploy Azure Active Directory Hybrid Identities with Active Directory Domain Service Azure Active Directory Hybrid high available and none high available Deployments (For more resources related to this topic, see here.) Azure Active Directory Azure Active Directory or Azure AD a multi-tenant cloud based directory and identity management service developed by Microsoft. Azure AD also includes a full suite of identity management capabilities including multi-factor authentication, device registration, self-service password management, self-service group management, privileged account management, role based access control, application usage monitoring, rich auditing and security monitoring and alerting. Azure AD can be integrated with an existing Windows Server Active Directory, giving organizations the ability to leverage their existing on-premises identities to manage access to cloud based SaaS applications. After this article you will know how to setup Azure AD and Azure Connect. You will also able to design a high available infrastructure for identity replication. The following figure describes the general structure of Azure AD in a hybrid deployment with Active Directory Domain Services: Source: https://azure.microsoft.com/en-us/documentation/articles/active-directory-whatis/ Customers who already using Office 365, CRM online or Intune using Azure AD for their service. You can easily identify if you use Azure AD if you have a username like user@domain.onmicrosoft.com (.de or .cn are also possible if you are using Microsoft Cloud Germany or Azure China). Azure AD is a multi-tenant, geo-distributed, high available service running in 28 and more datacenters around the world. Microsoft implemented automated failover and at least a minimum of two copies of you Active Directory service in other regional or global datacenters. Regularly your directory is running in your primary datacenter, is replicated into another two in your region. If you only have two Azure datacenters in your region like in Europe, the copy will distribute to another region outside yours: Azure Active Directory options There are currently three selectable options for Azure Active Directory with different features to use. There will be a fourth option, Azure Active Directory Premium P2 available later 2016. Azure AD free Supports common features such as: Directory objects with up to 500,000 objects User/group management (add/update/delete)/ user-based provisioning, device registration Single Sign-On for up to 10 applications per user Self-service password change for cloud users Connect and sync with on-premises Active Directory Domain Service Up to 3 basic security and usage reports Azure AD basic Supports common features such as: Directory objects with unlimited objects User/group management (add/update/delete)/ user-based provisioning, device registration Single Sign-On for up to 10 applications per user Self-service password change for cloud users Connect and sync with on-premises Active Directory Domain Service Up to 3 basic security and usage reports Basic features such as: Group-based access management/provisioning Self-service password reset for cloud users Company branding (logon pages/ access panel customization) Application proxy Service level agreement 99,9% Azure AD premium P1 Supports common features such as: Directory Objects with unlimited objects User/group management (add/update/delete)/ user-based provisioning, device registration Single Sign-On for up to 10 applications per user Self-service password change for cloud users Connect and sync with on-premises Active Directory Domain Service Up to 3 basic security and usage reports Basic features such as: Group-based access management/provisioning Self-service password reset for cloud users Company branding (logon pages/ access panel customization) Application proxy Service level agreement 99,9% Premium features such as: Self-service group and app management/self-service application additions/ dynamic groups Self-service password reset/change/unlock with on-premises write-back Multi-factor authentication (the Cloud and on-premises (MFA server)) MIM CAL with MIM server Cloud app discovery Connect health Automatic password rollover for group accounts Within Q3/2016 Microsoft will also enable customers to use the Azure Active Directory P2 plan, includes all the capabilities in Azure AD Premium P1 as well as the new identity protection and privileged identity management capabilities. That is a necessary step for Microsoft to extend it’s offering for Windows 10 Device Management with Azure AD. Currently Azure AD enable Windows 10 customers to join a device to Azure AD, implement SSO for desktops, Microsoft passport for Azure AD and central Administrator Bitlocker recovery. It also adds automatic classification to Active Directory Rights Management Service in Azure AD. Depending on what you plan to do with your Azure Environment, you should choose your Azure Active Directory Option.  
Read more
  • 0
  • 0
  • 1580
article-image-deploying-openstack-devops-way
Packt
07 Feb 2017
16 min read
Save for later

Deploying OpenStack – the DevOps Way

Packt
07 Feb 2017
16 min read
In this article by Chandan Dutta Chowdhury, Omar Khedher, authors of the book Mastering OpenStack - Second Edition, we will cover the Deploying an OpenStack environment based on the profiled design. Although we created our design by taking care of several aspects related to scalability and performance, we still have to make it real. If you are still looking at OpenStack as a single block system. (For more resources related to this topic, see here.) Furthermore, in the introductory section of this article, we covered the role of OpenStack in the next generation of data centers. A large-scale infrastructure used by cloud providers with a few thousand servers needs a very different approach to set up. In our case, deploying and operating the OpenStack cloud is not as simple as you might think. Thus, you need to make the operational task easier or, in other words, automated. In this article, we will cover new topics about the ways to deploy OpenStack. The next part will cover the following points: Learning what the DevOps movement is and how it can be adopted in the cloud Knowing how to see your infrastructure as code and how to maintain it Getting closer to the DevOps way by including configuration management aspects in your cloud Making your OpenStack environment design deployable via automation Starting your first OpenStack environment deployment using Ansible DevOps in a nutshell The term DevOps is a conjunction of development (software developers) and operations (managing and putting software into production). Many IT organizations have started to adopt such a concept, but the question is how and why? Is it a job? Is it a process or a practice? DevOps is development and operations compounded, which basically defines a methodology of software development. It describes practices that streamline the software delivery process. It is about raising communication and integration between developers, operators (including administrators), and quality assurance teams. The essence of the DevOps movement lies in leveraging the benefits of collaboration. Different disciplines can relate to DevOps in different ways and bring their experiences and skills together under the DevOps banner to gain shared values. So, DevOps is a methodology that integrates the efforts of several disciplines, as shown in the following figure: This new movement is intended to resolve the conflict between developers and operators. Delivering a new release affects the production systems. It puts different teams in conflicting positions by setting different goals for them, for example, the development team wants the their latest code to go live while the operations team wants more time to test and stage the changes before going to production. DevOps fills the gap and streamlines the process of bringing in change by bringing in collaboration between the developers and operators. DevOps is neither a toolkit nor a job; it is the synergy that streamlines the process of change. Let's see how DevOps can incubate a cloud project. DevOps and cloud – everything is code Let's look at the architecture of cloud computing. While discussing a cloud infrastructure, we must remember that we are talking about a large scalable environment! The amazing switch to bigger environments requires us to simplify everything as much as possible. System architecture and software design are becoming more and more complicated. Every new release of software affords new functions and new configurations. Administering and deploying a large infrastructure would not be possible without adopting a new philosophy: infrastructure as code. When infrastructure is seen as code, the components of a given infrastructure are modeled as modules of code. What you need to do is to abstract the functionality of the infrastructure into discrete reusable components, design the services provided by the infrastructure as modules of code, and finally implement them as blocks of automation. Furthermore, in such a paradigm, it will be essential to adhere to the same well-established discipline of software development as an infrastructure developer. The essence of DevOps mandates that developers, network engineers, and operators must work alongside each other to deploy, operate, and maintain cloud infrastructure which will power our next-generation data center. DevOps and OpenStack OpenStack is an open source project, and its code is extended, modified, and fixed in every release. It is composed of multiple projects and requires extensive skills to deploy and operate. Of course, it is not our mission to check the code and dive into its different modules and functions. So what can we do with DevOps, then? Deploying complex software on a large-scale infrastructure requires adopting new strategy. The ever-increasing complexity of software such as OpenStack and deployment of huge cloud infrastructure must be simplified. Everything in a given infrastructure must be automated! This is where OpenStack meets DevStack. Breaking down OpenStack into pieces Let's gather what we covered previously and signal a couple of steps towards our first OpenStack deployment: Break down the OpenStack infrastructure into independent and reusable services. Integrate the services in such a way that you can provide the expected functionalities in the OpenStack environment. It is obvious that OpenStack includes many services, What we need to do is see these services as packages of code in our infrastructure as code experience. The next step will investigate how to integrate the services and deploy them via automation. Deploying service as code is similar to writing a software application. Here are important points you should remember during the entire deployment process: Simplify and modularize the OpenStack services Develop OpenStack services as building blocks which integrate with other components to provide a complete system Facilitate the customization and improvement of services without impacting the complete system. Use the right tool to build the services Be sure that the services provide the same results with the same input Switch your service vision from how to do it to what we want to do Automation is the essence of DevOps. In fact, many system management tools are intensely used nowadays due to their efficiency of deployment. In other words, there is a need for automation! You have probably used some of the automation tools, such as Ansible, Chef, Puppet, and many more. Before we go through them, we need to create a succinct, professional code management step. Working with the infrastructure deployment code While dealing with infrastructure as code, the code that abstracts, models, and builds the OpenStack infrastructure must be committed to source code management. This is required for tracking changes in our automation code and reproducibility of results. Eventually, we must reach a point where we shift our OpenStack infrastructure from a code base to a deployed system while following the latest software development best practices. At this stage, you should be aware of the quality of your OpenStack infrastructure deployment, which roughly depends on the quality of the code that describes it. It is important to highlight a critical point that you should keep in mind during all deployment stages: automated systems are not able to understand human error. You'll have to go through an ensemble of phases and cycles using agile methodologies to end up with a release that is a largely bug-free to be promoted to the production environment. On the other hand, if mistakes cannot be totally eradicated, you should plan for the continuous development and testing of code. The code's life cycle management is shown in the following figure: Changes can be scary! To handle changes, it is recommended that you do the following: Keep track of and monitor the changes at every stage Build flexibility into the code and make it easy to change Refactor the code when it becomes difficult to manage Test, test, and retest your code Keep checking every point that has been described previously till you start to get more confident that your OpenStack infrastructure is being managed by code that won't break. Integrating OpenStack into infrastructure code To keep the OpenStack environment working with a minimum rate of surprises and ensure that the code delivers the functionalities that are required, we must continuously track the development of our infrastructure code. We will connect the OpenStack deployment code to a toolchain, where it will be constantly monitored and tested as we continue to develop and refine our code. This toolchain is composed of a pipeline of tracking, monitoring, testing, and reporting phases and is well known as a continuous integration and continuous development (CI-CD) process. Continuous integration and delivery Let's see how continuous integration (CI) can be applied to OpenStack. The life cycle of our automation code will be managed by the following categories of tools: System Management Tool Artifact (SMTA) can be any IT automation tool juju charms. Version Control System (VCS) tracks changes to our infrastructure deployment code. Any version control system, such as CVS, Subversion, or Bazaar, that you are most familiar with can be used for this purpose. Git can be a good outfit for our VCS. Jenkins is a perfect tool that monitors to changes in the VCS and does the continuous integration testing and reporting of results. Take a look at the model in the following figure: The proposed life-cycle for infrastructure as code consists of infrastructure configuration files that are recorded in a version control system and are built continuously by the means of a CI server (Jenkins, in our case). Infrastructure configuration files can be used to set up a unit test environment (a virtual environment using Vagrant, for example) and makes use of any system management tool to provision the infrastructure (Ansible, Chef, puppet, and so on). The CI server keeps listening to changes in version control and automatically propagates any new versions to be tested, and then it listens to target environments in production. Vagrant allows you to build a virtual environment very easily; it is based on Oracle VirtualBox (https://www.virtualbox.org/) to run virtual machines, so you will need these before moving on with the installation in your test environment. The proposed life-cycle for infrastructure code highlights the importance of a test environment before moving on to production. You should give a lot of importance to and care a good deal about the testing stage, although this might be a very time-consuming task. Especially in our case, with infrastructure code for deploying OpenStack that is complicated and has multiple dependencies on other systems, the importance of testing cannot be overemphasized. This makes it imperative to ensure effort is made for an automated and consistent testing of the infrastructure code. The best way to do this is to keep testing thoroughly in a repeated way till you gain confidence about your code. Choosing the automation tool At first sight, you may wonder what the best automation tool is that will be useful for our OpenStack production day. We have already chosen Git and Jenkins to handle our continuous integration and testing. It is time to choose the right tool for automation. It might be difficult to select the right tool. Most likely, you'll have to choose between several of them. Therefore, giving succinct hints on different tools might be helpful in order to distinguish the best outfit for certain particular setups. Of course, we are still talking about large infrastructures, a lot of networking, and distributed services. Giving the chance for one or more tools to be selected, as system management parties can be effective and fast for our deployment. We will use Ansible for the next deployment phase. Introducing Ansible We have chosen Ansible to automate our cloud infrastructure. Ansible is an infrastructure automation engine. It is simple to get started with and yet is flexible enough to handle complex interdependent systems. The architecture of Ansible consists of the deployment system where Ansible itself is installed and the target systems that are managed by Ansible. It uses an agentless architecture to push changes to the target systems. This is due to the use of SSH protocol as its transport mechanism to push changes to the target systems. This also means that there is no extra software installation required on the target system. The agentless architecture makes setting up Ansible very simple. Ansible works by copying modules over SSH to the target systems. It then executes them to change the state of the target systems. Once executed, the Ansible modules are cleaned up, leaving no trail on the target system. Although the default mechanism for making changes to the client system is an SSH-based push-model, if you feel the push-based model for delivering changes is not scalable enough for your infrastructure, Ansible also supports an agent-based pull-model. Ansible is developed in python and comes with a huge collection of core automation modules. The configuration files for Ansible are called Playbooks and they are written in YAML, which is just a markup language. YAML is easier to understand; it's custom-made for writing configuration files. This makes learning Ansible automation much easier. The Ansible Galaxy is a collection of reusable Ansible modules that can be used for your project. Modules Ansible modules are constructs that encapsulate a system resource or action. A module models the resource and its attributes. Ansible comes with packages with a wide range of core modules to represent various system resources; for example, the file module encapsulates a file on the system and has attributes such as owner, group, mode, and so on. These attributes represent the state of a file in the system; by changing the attributes of the resources, we can describe the required final state of the system. Variables While modules can represent the resources and actions on a system, the variables represent the dynamic part of the change. Variables can be used to modify the behavior of the modules. Variables can be defined from the environment of the host, for example, the hostname, IP address, version of software and hardware installed on a host, and so on. They can also be user-defined or provided as part of a module. User-defined variables can represent the classification of a host resource or its attribute. Inventory An inventory is a list of hosts that are managed by Ansible. The inventory list supports classifying hosts into groups. In its simplest form, an inventory can be an INI file. The groups are represented as article on the INI file. The classification can be based on the role of the hosts or any other system management need. It is possible to have a host to appear in multiple groups in an inventory file. The following example shows a simple inventory of hosts: logserver1.example.com [controllers] ctl1.example.com ctl2.example.com [computes] compute1.example.com compute2.example.com compute3.example.com compute[20:30].example.com The inventory file supports special patterns to represent large groups of hosts. Ansible expects to find the inventory file at /etc/ansible/hosts, but a custom location can be passed directly to the Ansible command line. Ansible also supports dynamic inventory that can be generated by executing scripts or retrieved from another management system, such as a cloud platform. Roles Roles are the building blocks of an Ansible-based deployment. They represent a collection of tasks that must be performed to configure a service on a group of hosts. The Role encapsulates tasks, variable, handlers, and other related functions required to deploy a service on a host. For example, to deploy a multinode web server cluster, the hosts in the infrastructure can be assigned roles such as web server, database server, load balancer, and so on. Playbooks Playbooks are the main configuration files in Ansible. They describe the complete system deployment plan. Playbooks are composed a series of tasks and are executed from top to bottom. The tasks themselves refer to group of hosts that must be deployed with roles. Ansible playbooks are written in YAML. The following is an example of a simple Ansible Playbook: --- - hosts: webservers vars: http_port: 8080 remote_user: root tasks: - name: ensure apache is at the latest version yum: name=httpd state=latest - name: write the apache config file template: src=/srv/httpd.j2 dest=/etc/httpd.conf notify: - restart apache handlers: - name: restart apache service: name=httpd state=restarted Ansible for OpenStack OpenStack Ansible (OSA) is an official OpenStack Big Tent project. It focuses on providing roles and playbooks for deploying a scalable, production-ready OpenStack setup. It has a very active community of developers and users collaborating to stabilize and bring new features to OpenStack deployment. One of the unique features of the OSA project is the use of containers to isolate and manage OpenStack services. OSA installs OpenStack services in LXC containers to provide each service with an isolated environment. LXC is an OS-level container and it encompasses a complete OS environment that includes a separate filesystem, networking stack, and resource isolation using cgroups. OpenStack services are spawned in separate LXC containers and speak to each other using the REST APIs. The microservice-based architecture of OpenStack complements the use of containers to isolate services. It also decouples the services from the physical hardware and provides encapsulation of the service environment that forms the foundation for providing portability, high availability, and redundancy. The OpenStack Ansible deployment is initiated from a deployment host. The deployment host is installed with Ansible and it runs the OSA playbooks to orchestrate the installation of OpenStack on the target hosts: The Ansible target hosts are the ones that will run the OpenStack services. The target nodes must be installed with Ubuntu 14.04 LTS and configured with SSH-key-based authentication to allow login from the deployment host. Summary In this article, we covered several topics and terminologies on how to develop and maintain a code infrastructure using the DevOps style. Viewing your OpenStack infrastructure deployment as code will not only simplify node configuration, but also improve the automation process. You should keep in mind that DevOps is neither a project nor a goal, but it is a methodology that will make your deployment successfully empowered by the team synergy with different departments. Despite the existence of numerous system management tools to bring our OpenStack up and running in an automated way, we have chosen Ansible for automation of our infrastructure. Puppet, Chef, Salt, and others can do the job but in different ways. You should know that there isn't one way to perform automation. Both Puppet and Chef have their own OpenStack deployment projects under the OpenStack Big Tent. Resources for Article: Further resources on this subject: Introduction to Ansible [article] OpenStack Networking in a Nutshell [article] Introducing OpenStack Trove [article]
Read more
  • 0
  • 0
  • 3989

article-image-what-azure-api-management
Packt
01 Feb 2017
15 min read
Save for later

What is Azure API Management?

Packt
01 Feb 2017
15 min read
In this article by Martin Abbott, Ashish Bhambhani, James Corbould, Gautam Gyanendra, Abhishek Kumar, and Mahindra Morar, authors of the book Robust Cloud Integration with Azure, we learn that it is important to know how to control and manage API assets that exist or are built as part of any enterprise development. (For more resources related to this topic, see here.) Typically, modern APIs are used to achieve one of the following two outcomes: First, to expose the on-premises line of business applications, such as Customer Relationship Management (CRM) or Enterprise Resource Planning (ERP) solutions to other applications that need to consume and interact with these enterprise assets both on-premises and in the cloud Second, to provide access to the API for commercial purposes to monetize access to the assets exposed by the API The latter use case is important as it allows organizations to extend the use of their API investment, and it has led to what has become known as the API economy. The API economy provides a mechanism to gain additional value from data contained within the organizational boundary whether that data exists in the cloud or on-premises. When providing access to information via an API, two considerations are important: Compliance: This ensures that an access to the API and the use of the API meets requirements around internal or legal policies and procedures, and it provides reporting and auditing information Governance: This ensures the API is accessed and used only by those authorized to do so, and in a way, that is controlled and if necessary metered, and provides reporting and auditing information, which can be used, for example, to provide usage information for billing In order to achieve this at scale in an organization, a tool is required that can be used to apply both compliance and governance structures to an exposed endpoint. This is required to ensure that the usage of the information behind that endpoint is limited only to those who should be allowed access and only in a way that meets the requirements and policies of the organization. This is where API Management plays a significant role. There are two main types of tools that fit within the landscape that broadly fall under the banner of API Management: API Management: These tools provide the compliance and governance control required to ensure that the exposed API is used appropriately and data presented in the correct format. For example, a message may be received in the XML format, but the consuming service may need the data in the JSON format. They can also provide monitoring tools and access control that allows organizations to gain insight into the use of the API, perhaps with the view to charge a fee for access. API Gateway: These tools provide the same or similar level of management as normal API Management tools, but often include other functionality that allows some message mediation and message orchestration thereby allowing more complex interactions and business processes to be modeled, exposed, and governed. Microsoft Azure API Management falls under the first category above whilst Logic Apps, provide the capabilities (and more) that API Gateways offer. Another important aspect of providing management of APIs is creating documentation that can be used by consumers, so they know how to interact with and get the best out of the API. For APIs, generally, it is not a case of build it and they will come, so some form of documentation that includes endpoint and operation information, along with sample code, can lead to greater uptake of usage of the API. Azure API Management is currently offered in three tiers: Developer, Standard, and Premium. The details associated with these tiers at the time of writing are shown in the following table:   Developer Standard Premium API Calls (per unit) 32 K / day ( ~1 M / month ) 7 M / day ( ~217 M / month ) 32 M / day ( ~1 B / month ) Data Transfer (per unit) 161 MB / day ( ~5 GB / month ) 32 GB / day ( ~1 TB / month ) 161 GB / day ( ~5 TB / month ) Cache 10 MB 1 GB 5 GB Scale-out N/A 4 units Unlimited SLA N/A 99.9% 99.95% Multi-Region Deployment N/A N/A Yes Azure Active Directory Integration Unlimited user accounts N/A Unlimited user accounts VPN Yes N/A Yes Key items of note in the table are Scale-out, multiregion deployment, and Azure Active Directory Integration. Scale-out: This defines how many instances, or units, of the API instance are possible; this is configured through the Azure Classic Portal Multi-region deployment: When using Premium tier, it is possible to deploy the API Management instance to many locations to provided geographically distributed load Azure Active Directory Integration: If an organization synchronizes an on-premises Active Directory domain to Azure, access to the API endpoints can be configured to use Azure Active Directory to provide same sign-on capabilities The main use case for Premium tier is if an organization has many hundreds or even thousands of APIs they want to expose to developers, or in cases where scale and integration with line of business APIs is critical. The anatomy of Azure API Management To understand how to get the best out of an API, it is important to understand some terms that are used for APIs and within Azure API Management, and these are described here. API and operations An API provides an abstraction layer through an endpoint that allows interaction with entities or processes that would otherwise be difficult to consume. Most API developers favor using a RESTful approach to API applications since this allows us easy understanding on how to work with the operations that the API exposes and provides scalability, modifiability, reliability, and performance. Representational State Transfer (REST) is an architectural style that was introduced by Roy Fielding in his doctoral thesis in 2000 (http://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm). Typically, modern APIs are exposed using HTTP since this makes it easier for different types of clients to interact with it, and this increased interoperability provides the greatest opportunity to offer additional value and greater adoption across different technology stacks. When building an API, a set of methods or operations is exposed that a user can interact with in a predictable way. While RESTful services do not have to use HTTP as a transfer method, nearly all modern APIs do, since the HTTP standard is well known to most developers, and it is simple and straightforward to use. Since the operations are called via HTTP, a distinct endpoint or Unified Resource Identifier (URI) is required to ensure sufficient modularity of the API service. When calling an endpoint, which may for example represent, an entity in a line of business system, HTTP verbs (GET, POST, PUT, and DELETE, for example) are used to provide a standard way of interacting with the object. An example of how these verbs are used by a developer to interact with an entity is given in the following table: TYPE GET POST PUT DELETE Collection Retrieve a list of entities and their URIs Create a new entity in the collection Replace (update) a collection Delete the entire collection Entity Retrieve a specific entity and its information usually in a particular data format Create a new entity in the collection, not generally used Replace (update) an entity in the collection, or if it does not exist, create it Delete a specific entity from a collection When passing data to and receiving data from an API operation, the data needs to be encapsulated in a specific format. When services and entities were exposed through SOAP-based services, this data format was typically XML. For modern APIs, JavaScript Object Notation (JSON) has become the norm. JSON has become the format of choice since it has a smaller payload than XML and a smaller processing overhead, which suits the limited needs of mobile devices (often running on battery power). JavaScript (as the acronym JSON implies) also has good support for processing and generating JSON, and this suits developers, who can leverage existing toolsets and knowledge. API operations should abstract small amounts of work to be efficient, and in order to provide scalability, they should be stateless, and they can be scaled independently. Furthermore, PUT and DELETE operations must be created that ensure consistent state regardless of how many times the specific operation is performed, this leads to the need of those operations being idempotent. Idempotency describes an operation that when performed multiple times produces the same result on the object that is being operated on. This is an important concept in computing, particularly, where you cannot guarantee that an operation will only be performed once, such as with interactions over the Internet. Another outcome of using a URI to expose entities is that the operation is easily modified and versioned because any new version can simply be made available on a different URI, and because HTTP is used as a transport mechanism, endpoint calls can be cached to provide better performance and HTTP Headers can be used to provide additional information, for example security. By default, when an instance of API Management is provisioned, it has a single API already available named Echo API. This has the following operations: Creating resource Modifying resource Removing resource Retrieving header only Retrieving resource Retrieving resource (cached) In order to get some understanding of how objects are connected, this API can be used, and some information is given in the next section. Objects within API Management Within Azure API Management, there are a number of key objects that help define a structure and provide the governance, compliance, and security artifacts required to get the best out of a deployed API, as shown in the following diagram: As can be seen, the most important object is a PRODUCT. A product has a title and description and is used to define a set of APIs that are exposed to developers for consumption. They can be Open or Protected, with an Open product being publicly available and a Protected product requiring a subscription once published. Groups provide a mechanism to organize the visibility of and access to the APIs within a product to the development community wishing to consume the exposed APIs. By default, a product has three standard groups that cannot be deleted: Administrators: Subscription administrators are included by default, and the members of this group manage API services instances, API creation, API policies, operations, and products Developers: The members of this group have authenticated access to the Developer Portal; they are the developers who have chosen to build applications that consume APIs exposed as a specific product Guests: Guests are able to browse products through the Developer Portal and examine documentation, and they have read-only access to information about the products In addition to these built-in groups, it is possible to create new groups as required, including the use of groups within an Azure Active Directory tenant. When a new instance of API Management is provisioned, it has the following two products already configured: Starter: This product limits subscribers to a maximum of five calls per minute up to a maximum of 100 calls per week Unlimited: This product has no limits on use, but subscribers can only use it with the administrator approval Both of these products are protected, meaning that they need to be subscribed to and published. They can be used to help gain some understanding of how the objects within API Management interact. These products are configured with a number of sample policies that can be used to provide a starting point. Azure API Management policies API Management policies are the mechanism used to provide governance structures around the API. They can define, for instance, the number of call requests allowed within a period, cross-origin resource sharing (CORS), or certificate authentication to a service backend. Policies are defined using XML and can be stored in source control to provide active management. Policies are discussed in greater detail later in the article. Working with Azure API Management Azure API Management is the outcome of the acquisition by Microsoft of Apiphany, and as such it has its own management interfaces. Therefore, it has a slightly different look and feel to the standard Azure Portal content. The Developer and Publisher Portals are described in detail in this section, but first a new instance of API Management is required. Once created and the provisioning in the Azure infrastructure can take some time, most interactions take place through the Developer and Publisher Portals. Policies in Azure API Management In order to provide control over interactions with Products or APIs in Azure API Management, policies are used. Policies make it possible to change the default behavior of an API in the Product, for example, to meet the governance needs of your company or Product, and are a series of statements executed sequentially on each request or response of an API. Three demo scenarios will provide a "taster" of this powerful feature of Azure API Management. How to use Policies in Azure API Management Policies are created and managed through the Publisher Portal. The first step in policy creation is to determine at what scope the policy should be applied. Policies can be assigned to all Products, individual Products, the individual APIs associated with a Product, and finally the individual operations associated with an API. Secure your API in Azure API Management We have previously discussed how it is possible to organize APIs in Products with those products further refined through the use of Policies. Access to and visibility of products is controlled through the use of Groups and developer subscriptions for those APIs requiring subscriptions. In most enterprise scenarios where you are providing access to some line of business system on-premises, it is necessary to provide sufficient security on the API endpoint to ensure that the solution remains compliant. There are a number of ways to achieve this level of security using Azure API Management, such as using certificates, Azure Active Directory, or extending the corporate network into Microsoft Azure using a Virtual Private Network (VPN), and creating a hybrid cloud solution. Securing your API backend with Mutual Certificates Certificate exchange allows Azure API Management and an API to create a trust boundary based on encryption that is well understood and easy to use. In this scenario, because Azure API Management is communicating with an API that has been provided, a self-signed certificate is allowed as the key exchange for the certificate is via a trusted party. For an in-depth discussion on how to configure Mutual Certificate authentication to secure your API, please refer to the Azure API Management documentation (https://azure.microsoft.com/en-us/documentation/articles/api-management-howto-mutual-certificates/). Securing your API backend with Azure Active Directory If an enterprise already uses Azure Active Directory to provide single or same sign-on to cloud-based services, for instance, on-premises Active Directory synchronization via ADConnect or DirSync, then this provides a good opportunity to leverage Azure Active Directory to provide a security and trust boundary to on-premises API solutions. For an in-depth discussion on how to add Azure Active Directory to an API Management instance, please see the Azure API Management documentation (https://azure.microsoft.com/en-us/documentation/articles/api-management-howto-protect-backend-with-aad/). VPN connection in Azure API Management Another way of providing a security boundary between Azure API Management and the API is managing the creation of a virtual private network. A VPN creates a tunnel between the corporate network edge and Azure, essentially creating a hybrid cloud solution. Azure API Management supports site-to-site VPNs, and these are created using the Classic Portal. If an organization already has an ExpressRoute circuit provisioned, this can also be used to provide connectivity via private peering. Because a VPN needs to communicate to on-premises assets, a number of firewall port exclusions need to be created to ensure the traffic can flow between the Azure API Management instance and the API endpoint. Monitoring your API Any application tool is only as good as the insight you can gain from the operation of the tool. Azure API Management is no exception and provides a number of ways of getting information about how the APIs are being used and are performing. Summary API Management can be used to provide developer access to key information in your organization, information that could be sensitive, or that needs to be limited in use. Through the use of Products, Policies, and Security, it is possible to ensure that firm control is maintained over the API estate. The developer experience can be tailored to provide a virtual storefront to any APIs along with information and blogs to help drive deeper developer engagement. Although not discussed in this article, it is also possible for developers to publish their own applications to the API Management instance for other developers to use. Resources for Article: Further resources on this subject: Creating Multitenant Applications in Azure [article] Building A Recommendation System with Azure [article] Putting Your Database at the Heart of Azure Solutions [article]
Read more
  • 0
  • 0
  • 7712