The Cloud
The mythical creature known as "The Cloud" has become a juggernaut of marketing collateral that often makes those who are technologically inclined want to laugh hysterically or run for the hills. However, it is in fact a paradigm of Information Technology that has taken the market by storm and has no inclination of leaving any time soon. Aside from the marketing hype, this concept of the cloud is truly an evolution of IT that aims to make lives easier for those who use, manage, design, and implement technology. Within the notion of "The Cloud", there are three main Service Models, as per the National Institute of Standards and Technology (NIST) definition of Cloud Computing (http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf), or areas of the cloud that are different in their advantages and disadvantages as well as their goals and feature sets. The three service models are:
Infrastructure as a Service (IaaS)
Platform as a Service (PaaS)
Software as a Service (SaaS)
Each of these is listed "as a Service" because the cloud is largely about taking traditional components from Information Technology and offering them as a service either to customers or users within an organization in order to provide a more flexible environment. One thing to note here is that these service models are loosely coupled such that we may use them together, but we do not inherently require all the layers in order to create a cloud architecture.
It was mentioned that the cloud has layers. This is mostly an attempt to help us understand how it all fits together, where the distinction between the service models exists, the roles they play, and how each can be applicable to their target user base. We can visualize these different service models as layers built upon one another, not unlike that of a stack. The lower you are in the stack, the more components you, as a user, are responsible for managing, and the higher you are in the stack, the more your service provider is responsible for. The following diagram will show this example, and further explanation will follow in the coming sections in this chapter:
Infrastructure as a Service (IaaS)
Beginning at the bottom of our stack, we will find the foundation upon which other layers will often be built. This layer is known as Infrastructure as a Service (IaaS), and it has become a part of the natural evolution to traditional virtualization technologies largely deployed in data centers all over the world. Within an IaaS cloud environment, all the aspects of an infrastructure are virtualized into an abstraction structure. With the introduction of this abstraction, we allow for these components to be utilized in a more flexible manner. Often found within IaaS Clouds are virtualized compute nodes, which are equivalent to traditional virtual machines but are more dynamic or ephemeral in nature. Storage is considered to be virtualized as well and is deployed in a scaled-out approach, generally offering block storage both as ephemeral resources or as persistent disks. Also common among IaaS environments are virtual networks and virtual firewalls allowing for the separation of resources on the network by creating network security zones.
As a user of IaaS Cloud, there are no ticketing systems for which we have to file requests in order to retrieve the resources that the systems administration or operations team provide. Instead, the service model offers the ability to simply provide what is needed. IaaS offers its power and flexibility at this point where we, as a user, are left to make decisions on criteria such as:
Operating System Deployment (OSD)
Service Daemon Configuration (SDC)
Storage Provisioning (SP)
Network Configuration (NC)
Backups
While these items are criteria that attribute to the flexibility of IaaS, they also incurthe overhead of needing a DevOps team or, at a minimum, someone on the staff knowledgeable in the area of DevOps and dedicated to the project at hand. There are a number of open source IaaS solutions that have gained considerable popularity, which will provide great examples and a wealth of documentation for readers who would like to continue on their education in this space. This list is alphabetical and possibly not all-inclusive:
Apache CloudStack: https://cloudstack.apache.org/
Eucalyptus: http://www.eucalyptus.com/
Nimbus: http://www.nimbusproject.org/
OpenNebula: http://www.opennebula.org/
OpenStack: http://www.openstack.org/
Note
DevOps is a new paradigm where the Dev and Ops teams work together in order to solve the need for increased release cycles. The term has been coined by a movement in response to the widening gap between the Dev and Ops teams. It is aimed to solve the problems where a Dev team would write code and hand it over to the Ops team and there was very little coordination between the two. DevOps utilizes the aspects of the cloud, configuration management, and automation tools to satisfy the Dev team's requirement for fast-moving environments, and the Ops team's requirement of a stable and controlled infrastructure.
The common tools in this area are the configuration management software, and readers interested in this area are encouraged to read up on one or many of the following (listed alphabetically):
Ansible: http://www.ansibleworks.com/
Chef: http://www.opscode.com/chef/
Puppet: http://puppetlabs.com/
Salt : http://saltstack.org/
Platform as a Service (PaaS)
Moving up one layer in our stack example, we find ourselves at Platform as a Service (PaaS). This service model aims to offer some of the flexibility of an IaaS while removing a great deal of the overhead such as the need to maintain the operating system, storage, deployment, provisioning, and configuration management. The offerings in this space will take the abstraction one level higher, and instead of virtualizing every component of the infrastructure that would normally be provisioned as hardware, PaaS effectively offers the pieces of a puzzle, which when put together, provide the platform on which applications can run. In a PaaS environment, the administrators, developers, or deployment managers of web applications can select the components upon which their application will run, such as the service daemon, programming language, web framework, and database. At this point, the end user's decision should focus more on whether the PaaS being reviewed offers features needed by the individual interested in hosting, deploying, or developing a particular application, along with its capacity, scaling, backup, and any other potential concerns.
Now, there are a number of PaaS providers available and anyone looking to select one should indeed spend some time with their favorite search engine to find candidates interested in becoming their provider. The top contenders should also be taken for a test drive before making any hard decisions. However, since this book is about OpenShift, I hope the reader has decided to use OpenShift, and other PaaS providers will not be discussed as such. One thing to note as a tie-in with the stack analogy is how some PaaS architectures are tightly integrated with IaaS using an Application Programming Interface (API). The API can be used as a means of automating tasks within IaaS from the perspective of PaaS, such as launching a new compute node, auto-configuring its storage and services daemons, and adding these new resources to the PaaS environment to increase capacity. Also note that even if PaaS Clouds are not integrated directly to IaaS, they are often deployed on top of IaaS because of the flexible nature of IaaS Clouds.
Software as a Service (SaaS)
Sitting on the top layer of the stack, Software as a Service (SaaS) is the cloud evolution of hosted web applications. This layer of abstraction removes the largest amount of control from the user or customer as they take upon the role of simply a user, or possibly as an application administrator, and the entire platform upon which it runs is managed by the service provider as well as all the infrastructure concerns. However, as with all things where there is "give", there must be "take", and in this scenario, the "give" is a loss of control and flexibility in terms of architectural decisions, choice of backend programming languages, frameworks, databases, and any other selections of technology used. The "take" side of this and why this service model gains such widespread adoption is that some organizations, companies, or teams do not have the expertise, the desire to take on the technical aspects of a hosted web application, or might consider such functions as a burden. Common examples of SaaS hosting are Customer Relations Management (CRM), Enterprise Resource Planning (ERP), Management Information Systems (MIS), as well as other essential business-focused software solutions.
SSH
Where did all the clouds go? Why are we talking about SSH all of a sudden? Well, we're talking about SSH because it is an important component of OpenShift as well as other PaaS Clouds. SSH is an acronym for Secure Shell and it is a network communications protocol that creates encrypted connections for remote command executions, shell sessions, and data transfer. From a user's standpoint, SSH is quite simple to use, but do not let that be an indication of its potential as it is quite powerful. We will briefly discuss some simple SSH commands in context to the OpenShift use cases, but first, we need to understand a couple of things about how SSH works so that we can set up some prerequisites. The first thing on our list of prerequisites is the fact that SSH offers public- or private-key-based authentication, which is extremely common and is also used by OpenShift. The most popular implementation of SSH is arguably OpenS SH (http://www.openssh.com/), which is used by OpenShift. OpenSSH can also use other methods for authentication, such as passwords, Single sign-on mechanisms, and even Two-factor authentication. These alternatives are not covered here as they are not applicable to our coverage of OpenShift.
Once public keys are in place, something that OpenShift's client utilities will set up for us, we can simply run the following command to connect to a remote server in order to run commands in an interactive shell.
Note
If we are doing this against a server that is not an OpenShift Gear, we will have to verify whether the configurations are in place to allow for passwordless SSH; there are many guides on this online so we won't discuss it here.
Gears will be explained at length in a later section, but it's effectively a GNU/Linux sandbox environment that is resource constrained and secured with SELinux.
user@mylaptop$ ssh username@server.example.com user@server.example.com$
If you are using a GNU/Linux distribution or Mac OS X, you will most likely have an SSH client preinstalled; however, if you are a Windows user, you will need to install a third-party SSH client application such as PuTTY (http://www.putty.org/).
In the preceding example, the shell prompt, user@mylaptop$
, is used to signify a shell on the local machine, and once the SSH connection is established, the prompt changes to user@server.example.com$
, signifying that the shell session that is currently at our fingertips is on a different machine. While shell prompts will vary greatly in the wild because of the flexibility of their configuration, this should serve as a decent placeholder to understand that once we are typing into a shell prompt at user@server.example.com$
, these commands are happening remotely.
The following diagram shows a simple layout of a client computer (such as a laptop) and a server system, along with a sample user account that resides on the server system, cleverly named user
that will offer itself as a high-level overview of the introductory example we previously covered.
Note
There are actually a lot of steps going on in the background of this diagram that have to do with setting up the encrypted connection, but an in-depth coverage of these is not within the scope of this publication.
Another thing we can do with SSH, other than logging in to a remote shell, is execute single commands remotely and receive their output in the local terminal. The following example will display how to obtain our quota information from an OpenShift Gear using the quota
command, without actually entering into an interactive shell session remotely.
OpenShift shell prompts do not actually look like this in real usage; the prompt in the example was modified to maintain consistency with the previous examples. The actual OpenShift prompts and SSH username formatting will be covered in later sections.
user@mylaptop$ ssh user@server.example.com 'quota' Disk quotas for user user@server.example.com (uid 6017): Filesystem blocks quota limit grace files quota limit grace /dev/mapper/EBSStore01-user_home01 604 0 1048576 172 0 40000
As we can see here, the command was executed remotely and the output was sent back to us providing seamless interaction, almost as though we ran the command locally.
SSH is often just used outside interactive shells and remote-command execution. Many utilities in traditional Unix and Unix-like operating systems use or have the option to use SSH as their data transport in order to provide secure transmission of whatever data they need to move between two hosts. Common utilities in this category are rsync
, scp
, mercurial
(hg
), and git
, which leads us into the next section based on git
.
Git
Once upon a time, developers would maintain complex directory structures of source code that would live on a central server. Members of the development team would mount the directory over a shared file system or develop collectively on the same server, both of which posed a laundry list of problems. There is a classification of utility known as Version Control Systems (VCSs), which solve these issues. VCSs create the ability to maintain a manifest of differentials between "commits" or "versions" of a code base and much more. The VCS of choice for code management and deployment with OpenShift is named Git. The following is an excerpt from the Git website (http://git-scm.com/):
"Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows".
Before we go too deep into the details of Git, there needs to be some discussion about a few Git concepts that are essential to understanding how Git functions and why it is so powerful for developers. The first of which is the notion of a branch. In Git, there is the source-code repository that has been initialized to be tracked, and within that repository, there can be many branches. A branch is effectively a sub-repository snapshot that maintains its own change logs, snapshots, metadata, and so on. A Git branch is not a unique concept as other version control systems share this feature, but many who have never experienced it might find it difficult to follow at first, so hopefully the following diagram will help to clarify:
In the preceding diagram, there are three lines, each representing a branch. A focal point to make note of is what is known as the master branch, which is created by default when you create and initialize a Git repository. It stands to note that at the time a branch is created, it is a point-in-time snapshot of the code base from where the branch originates, and each branch can receive code commits independently from one another. Within the diagram, in this hypothetical Git repository, there are two other branches. One is called dev
and another is called some_feature
, both of which are meant to show that this is all the same code base but has deviations during the development timeline. The arrows moving between the branches introduce another concept from Git called a merge. In Git, when you merge from one branch to another, you are applying the change set or differential from another branch upon the current one. Git has a number of clever methods for accomplishing this task, but it should be mentioned that there is a possibility of a conflict that would have to be resolved before the merge operation can be completed. There are methods for mitigating the risk of merge conflicts, which will be discussed later in this section. The manner in which developers perform their branch-and-merge process is up to their respective development team. There are many approaches to branch/merge development cycles, each with advantages and disadvantages, and discussions of these exist far and wide on the Internet. It is advisable to spend some time researching to find the one that best fits a project's development style.
Note
This has been a very rapid discussion of Git concepts, and we have only scraped the surface of its power and distributed nature. It would be advisable to spend some time with the Git project's documentation (http://git-scm.com/doc) for users who are interested in the breadth of capability that Git offers.
Hopefully, there is enough background information covered up to this point in order to start working with Git, so we will first want to set up a couple of global parameters for good measure.
Note
While it was not covered here, it is assumed that Git is installed on the user's system. For GNU/Linux users of debian-based distributions, this can be done with apt-get install git
as the root (or the git-all
package to pull in all subpackages) or from a Fedora- or Red Hat-based system, it can be accomplished using yum install git
as the root. Other Linux distributions are likely to have the installable package name of git
in their respective repositories. For users of Mac OS X or Windows, please visit Git's download site (http://git-scm.com/downloads) in order to obtain your installation medium.
When using Git for the first time, the first order of business is to set a few global Git settings such as developer identity, editor of choice, and diff tool (for merges). Run the following commands as the system user (that is, as a non-root user), which will be used for development, replacing the name and e-mail address with your own:
$ git config --global user.name "John Smith" $ git config --global user.email johnsmith@example.com
Next up on the list will be to configure the editor of choice. Most developers like to use either vim
or emacs
, but these are certainly not the only editors in town, so use what fits best. We can configure the editor as follows:
$ git config --global core.editor vim
After these are in place, it would also be wise to configure a merge tool, which is used to assist when handling the merge conflicts. On my system, which is Fedora 19, at the time of writing, the command git mergetool –tool-help
lists the following as valid entries as a merge tool: araxis
, bc3
, codecompare
, diffuse
, ecmerge
, emerge
, gvimdiff
, kdiff3
, meld
, opendiff
, p4merge
, tkdiff
, tortoisemerge
, vimdiff
, and xxdiff
. These tools are simple examples of merge utilities that can be used, and we should select one we feel comfortable with, or accept the defaults for your system if this is uncharted territory. For those using a GNU/Linux distribution as their development platform of choice, and who enjoy graphical environments, meld
and kdiff3
have both received a lot of positive feedback and would likely be a decent place to start. As a vim
user, vimdiff
is the merge tool of choice and we'll configure it as follows:
$ git config --global merge.tool vimdiff
There are also a number of other configurable Git variables, which may be found using either the Git documentation found on their website or via the git-config
main page.
Moving on, for the sake of the example, let's assume that there is an application we are going to write named my_app.
For simplicity, it will just be a simple "Hello World" example in Ruby, but it will be enough to cover the basic usage commands. First, we need a directory that we will turn into a Git repository using the following commands:
$ mkdir my_app $ cd my_app $ git init Initialized empty Git repository in ~/myapp/.git/
That's it. That's the magic; we did it! See how easy that was? It is truly amazing how powerful Git is, considering how simple it is to use. Next up, we need to create a file named app.rb
with the following contents:
#!/usr/bin/env ruby puts "Hello world!"
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com.If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to havethe files e-mailed directly to you.
Note
The #!/usr/bin/env ruby
line is what is called a shebang, and it defines the environment in which the file should be executed. This is a common Unix-ism and will have no effect on the Windows environments.
Since we have a file and some contents, we'll now need to add it to git
in order to be tracked by Git using the following command:
$ git add app.rb
Then to check the status of our Git repository, run the following command and you should get a similar output:
$ git status # On branch master # # Initial commit # # Changes to be committed: # (use "git rm --cached <file>..." to unstage) # # new file: app.rb #
The portion of these lines of commands that is of interest is the Changes to be committed
part. This means we've added changes to a "staging" status and it is ready to be committed to the Git log. Also, we can add a commit
message to provide some context to what the contents of this commit are. We will commit and then check the Git log; remember, Git maintains a log of all the code that is committed to the repository. Commit the code and view the Git logs with the following commands:
$ git commit -m "Initial commit of app.rb, Hello World example" [master (root-commit) 77839fd] Initial commit of app.rb, Hello World example 1 file changed, 3 insertions(+) create mode 100644 app.rb $ git log commit 77839fdef6f17012797e93f05516d342570d31d6 Author: Adam Miller <maxamillion@fedoraproject.org> Date: Wed Jan 9 23:21:04 2013 -0600 Initial commit of app.rb, Hello World example
One thing to note here is that if you were to run the command, git show
, it will show you the latest entry in the Git log, including the changes committed as follows. We will see the line start with two paths that don't really exist, a/app.rb
and b/app.rb
, these are effectively placeholders that show the differential between what app.rb
used to be and what it is now within this Git branch:
$ git show commit 77839fdef6f17012797e93f05516d342570d31d6 Author: Adam Miller <maxamillion@fedoraproject.org> Date: Wed Jan 9 23:21:04 2013 -0600 Initial commit of app.rb, Hello World example diff --git a/app.rb b/app.rb new file mode 100644 index 0000000..2966711 --- /dev/null +++ b/app.rb @@ -0,0 +1,3 @@ +#!/usr/bin/env ruby + +puts "Hello world!"
In the preceding output, there is a commit ID, which is a unique identifier for this commit, followed by the Author and Date stamp for the commit.
Note
A quick side mention that should be considered is that date stamps are not always chronologically ordered as we might think they should be, and this can happen in a number of ways, but most commonly, are going to be time zones of commits in a distributed development model or merges intermingling commits.
After the commit ID, the Author, and the Date stamp, is the commit message andthe diff. For those familiar with the diff
and patch
tools, they will feel right at home with this output formatting and its meanings. If this is new territory, fret not as the output is relatively straightforward: the lines with a +
character prepended are additions to the file, lines with a -
character prepended are removals from the file, lines without any prepended characters are not modified, and lines with the @@
characters are offsets in the file.
If the Git repository we were working with had not been initialized on our local machine, but instead had been cloned from a remote repository, which is what happens when you use OpenShift, there would be one more command needed to propagate this commit to the remote server: git push
. Do you remember we have mentioned before that Git is distributed, and therefore, the commit we made previously was only to our local repository? By performing a git push
, we are "pushing" those changes out to a remote location. The default remote location in Git nomenclature is known as origin
, but we need not supply that information to the command because by default it is assumed.
Note
Note that the following output is from an OpenShift Git repository and will contain some output that might not be very meaningful, but don't worry as this will be covered at length in the later sections.
$ git push Counting objects: 4, done. Delta compression using up to 4 threads. Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 290 bytes, done. Total 3 (delta 1), reused 0 (delta 0) remote: restart_on_add=false remote: Waiting for stop to finish remote: Done remote: restart_on_add=false remote: ~/git/sinatra.git ~/git/sinatra.git remote: ~/git/sinatra.git remote: Running .openshift/action_hooks/pre_build remote: Running .openshift/action_hooks/build remote: Running .openshift/action_hooks/deploy remote: hot_deploy_added=false remote: Done remote: Running .openshift/action_hooks/post_deploy To ssh://891a6370bd884b348305552b1c9485e7@sinatra-admiller.rhcloud.com/~/git/sinatra.git/ bab6f7c..e703aa8 master -> master