The mythical creature known as "The Cloud" has become a juggernaut of marketing collateral that often makes those who are technologically inclined want to laugh hysterically or run for the hills. However, it is in fact a paradigm of Information Technology that has taken the market by storm and has no inclination of leaving any time soon. Aside from the marketing hype, this concept of the cloud is truly an evolution of IT that aims to make lives easier for those who use, manage, design, and implement technology. Within the notion of "The Cloud", there are three main Service Models, as per the National Institute of Standards and Technology (NIST) definition of Cloud Computing (http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf), or areas of the cloud that are different in their advantages and disadvantages as well as their goals and feature sets. The three service models are:
It was mentioned that the cloud has layers. This is mostly an attempt to help us understand how it all fits together, where the distinction between the service models exists, the roles they play, and how each can be applicable to their target user base. We can visualize these different service models as layers built upon one another, not unlike that of a stack. The lower you are in the stack, the more components you, as a user, are responsible for managing, and the higher you are in the stack, the more your service provider is responsible for. The following diagram will show this example, and further explanation will follow in the coming sections in this chapter:
Platform as a Service (PaaS)
Moving up one layer in our stack example, we find ourselves at Platform as a Service (PaaS). This service model aims to offer some of the flexibility of an IaaS while removing a great deal of the overhead such as the need to maintain the operating system, storage, deployment, provisioning, and configuration management. The offerings in this space will take the abstraction one level higher, and instead of virtualizing every component of the infrastructure that would normally be provisioned as hardware, PaaS effectively offers the pieces of a puzzle, which when put together, provide the platform on which applications can run. In a PaaS environment, the administrators, developers, or deployment managers of web applications can select the components upon which their application will run, such as the service daemon, programming language, web framework, and database. At this point, the end user's decision should focus more on whether the PaaS being reviewed offers features needed by the individual interested in hosting, deploying, or developing a particular application, along with its capacity, scaling, backup, and any other potential concerns.
Now, there are a number of PaaS providers available and anyone looking to select one should indeed spend some time with their favorite search engine to find candidates interested in becoming their provider. The top contenders should also be taken for a test drive before making any hard decisions. However, since this book is about OpenShift, I hope the reader has decided to use OpenShift, and other PaaS providers will not be discussed as such. One thing to note as a tie-in with the stack analogy is how some PaaS architectures are tightly integrated with IaaS using an Application Programming Interface (API). The API can be used as a means of automating tasks within IaaS from the perspective of PaaS, such as launching a new compute node, auto-configuring its storage and services daemons, and adding these new resources to the PaaS environment to increase capacity. Also note that even if PaaS Clouds are not integrated directly to IaaS, they are often deployed on top of IaaS because of the flexible nature of IaaS Clouds.
Where did all the clouds go? Why are we talking about SSH all of a sudden? Well, we're talking about SSH because it is an important component of OpenShift as well as other PaaS Clouds. SSH is an acronym for Secure Shell and it is a network communications protocol that creates encrypted connections for remote command executions, shell sessions, and data transfer. From a user's standpoint, SSH is quite simple to use, but do not let that be an indication of its potential as it is quite powerful. We will briefly discuss some simple SSH commands in context to the OpenShift use cases, but first, we need to understand a couple of things about how SSH works so that we can set up some prerequisites. The first thing on our list of prerequisites is the fact that SSH offers public- or private-key-based authentication, which is extremely common and is also used by OpenShift. The most popular implementation of SSH is arguably OpenS
SH (http://www.openssh.com/), which is used by OpenShift. OpenSSH can also use other methods for authentication, such as passwords, Single sign-on mechanisms, and even Two-factor authentication. These alternatives are not covered here as they are not applicable to our coverage of OpenShift.
Once public keys are in place, something that OpenShift's client utilities will set up for us, we can simply run the following command to connect to a remote server in order to run commands in an interactive shell.
Note
If we are doing this against a server that is not an OpenShift Gear, we will have to verify whether the configurations are in place to allow for passwordless SSH; there are many guides on this online so we won't discuss it here.
Gears will be explained at length in a later section, but it's effectively a GNU/Linux sandbox environment that is resource constrained and secured with SELinux.
If you are using a GNU/Linux distribution or Mac OS X, you will most likely have an SSH client preinstalled; however, if you are a Windows user, you will need to install a third-party SSH client application such as PuTTY (http://www.putty.org/).
In the preceding example, the shell prompt, user@mylaptop$
, is used to signify a shell on the local machine, and once the SSH connection is established, the prompt changes to user@server.example.com$
, signifying that the shell session that is currently at our fingertips is on a different machine. While shell prompts will vary greatly in the wild because of the flexibility of their configuration, this should serve as a decent placeholder to understand that once we are typing into a shell prompt at user@server.example.com$
, these commands are happening remotely.
The following diagram shows a simple layout of a client computer (such as a laptop) and a server system, along with a sample user account that resides on the server system, cleverly named user
that will offer itself as a high-level overview of the introductory example we previously covered.
Note
There are actually a lot of steps going on in the background of this diagram that have to do with setting up the encrypted connection, but an in-depth coverage of these is not within the scope of this publication.
Another thing we can do with SSH, other than logging in to a remote shell, is execute single commands remotely and receive their output in the local terminal. The following example will display how to obtain our quota information from an OpenShift Gear using the quota
command, without actually entering into an interactive shell session remotely.
OpenShift shell prompts do not actually look like this in real usage; the prompt in the example was modified to maintain consistency with the previous examples. The actual OpenShift prompts and SSH username formatting will be covered in later sections.
As we can see here, the command was executed remotely and the output was sent back to us providing seamless interaction, almost as though we ran the command locally.
SSH is often just used outside interactive shells and remote-command execution. Many utilities in traditional Unix and Unix-like operating systems use or have the option to use SSH as their data transport in order to provide secure transmission of whatever data they need to move between two hosts. Common utilities in this category are rsync
, scp
, mercurial
(hg
), and git
, which leads us into the next section based on git
.
Once upon a time, developers would maintain complex directory structures of source code that would live on a central server. Members of the development team would mount the directory over a shared file system or develop collectively on the same server, both of which posed a laundry list of problems. There is a classification of utility known as Version Control Systems (VCSs), which solve these issues. VCSs create the ability to maintain a manifest of differentials between "commits" or "versions" of a code base and much more. The VCS of choice for code management and deployment with OpenShift is named Git. The following is an excerpt from the Git website (http://git-scm.com/):
"Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency.
Git is easy to learn and has a tiny footprint with lightning fast performance. It outclasses SCM tools like Subversion, CVS, Perforce, and ClearCase with features like cheap local branching, convenient staging areas, and multiple workflows".
Before we go too deep into the details of Git, there needs to be some discussion about a few Git concepts that are essential to understanding how Git functions and why it is so powerful for developers. The first of which is the notion of a branch. In Git, there is the source-code repository that has been initialized to be tracked, and within that repository, there can be many branches. A branch is effectively a sub-repository snapshot that maintains its own change logs, snapshots, metadata, and so on. A Git branch is not a unique concept as other version control systems share this feature, but many who have never experienced it might find it difficult to follow at first, so hopefully the following diagram will help to clarify:
In the preceding diagram, there are three lines, each representing a branch. A focal point to make note of is what is known as the master branch, which is created by default when you create and initialize a Git repository. It stands to note that at the time a branch is created, it is a point-in-time snapshot of the code base from where the branch originates, and each branch can receive code commits independently from one another. Within the diagram, in this hypothetical Git repository, there are two other branches. One is called dev
and another is called some_feature
, both of which are meant to show that this is all the same code base but has deviations during the development timeline. The arrows moving between the branches introduce another concept from Git called a merge. In Git, when you merge from one branch to another, you are applying the change set or differential from another branch upon the current one. Git has a number of clever methods for accomplishing this task, but it should be mentioned that there is a possibility of a conflict that would have to be resolved before the merge operation can be completed. There are methods for mitigating the risk of merge conflicts, which will be discussed later in this section. The manner in which developers perform their branch-and-merge process is up to their respective development team. There are many approaches to branch/merge development cycles, each with advantages and disadvantages, and discussions of these exist far and wide on the Internet. It is advisable to spend some time researching to find the one that best fits a project's development style.
Note
This has been a very rapid discussion of Git concepts, and we have only scraped the surface of its power and distributed nature. It would be advisable to spend some time with the Git project's documentation (http://git-scm.com/doc) for users who are interested in the breadth of capability that Git offers.
Hopefully, there is enough background information covered up to this point in order to start working with Git, so we will first want to set up a couple of global parameters for good measure.
Note
While it was not covered here, it is assumed that Git is installed on the user's system. For GNU/Linux users of debian-based distributions, this can be done with apt-get install git
as the root (or the git-all
package to pull in all subpackages) or from a Fedora- or Red Hat-based system, it can be accomplished using yum install git
as the root. Other Linux distributions are likely to have the installable package name of git
in their respective repositories. For users of Mac OS X or Windows, please visit Git's download site (http://git-scm.com/downloads) in order to obtain your installation medium.
When using Git for the first time, the first order of business is to set a few global Git settings such as developer identity, editor of choice, and diff tool (for merges). Run the following commands as the system user (that is, as a non-root user), which will be used for development, replacing the name and e-mail address with your own:
Next up on the list will be to configure the editor of choice. Most developers like to use either vim
or emacs
, but these are certainly not the only editors in town, so use what fits best. We can configure the editor as follows:
After these are in place, it would also be wise to configure a merge tool, which is used to assist when handling the merge conflicts. On my system, which is Fedora 19, at the time of writing, the command git mergetool –tool-help
lists the following as valid entries as a merge tool: araxis
, bc3
, codecompare
, diffuse
, ecmerge
, emerge
, gvimdiff
, kdiff3
, meld
, opendiff
, p4merge
, tkdiff
, tortoisemerge
, vimdiff
, and xxdiff
. These tools are simple examples of merge utilities that can be used, and we should select one we feel comfortable with, or accept the defaults for your system if this is uncharted territory. For those using a GNU/Linux distribution as their development platform of choice, and who enjoy graphical environments, meld
and kdiff3
have both received a lot of positive feedback and would likely be a decent place to start. As a vim
user, vimdiff
is the merge tool of choice and we'll configure it as follows:
There are also a number of other configurable Git variables, which may be found using either the Git documentation found on their website or via the git-config
main page.
Moving on, for the sake of the example, let's assume that there is an application we are going to write named my_app.
For simplicity, it will just be a simple "Hello World" example in Ruby, but it will be enough to cover the basic usage commands. First, we need a directory that we will turn into a Git repository using the following commands:
That's it. That's the magic; we did it! See how easy that was? It is truly amazing how powerful Git is, considering how simple it is to use. Next up, we need to create a file named app.rb
with the following contents:
Tip
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com.If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to havethe files e-mailed directly to you.
Note
The #!/usr/bin/env ruby
line is what is called a shebang, and it defines the environment in which the file should be executed. This is a common Unix-ism and will have no effect on the Windows environments.
Since we have a file and some contents, we'll now need to add it to git
in order to be tracked by Git using the following command:
Then to check the status of our Git repository, run the following command and you should get a similar output:
The portion of these lines of commands that is of interest is the Changes to be committed
part. This means we've added changes to a "staging" status and it is ready to be committed to the Git log. Also, we can add a commit
message to provide some context to what the contents of this commit are. We will commit and then check the Git log; remember, Git maintains a log of all the code that is committed to the repository. Commit the code and view the Git logs with the following commands:
One thing to note here is that if you were to run the command, git show
, it will show you the latest entry in the Git log, including the changes committed as follows. We will see the line start with two paths that don't really exist, a/app.rb
and b/app.rb
, these are effectively placeholders that show the differential between what app.rb
used to be and what it is now within this Git branch:
In the preceding output, there is a commit ID, which is a unique identifier for this commit, followed by the Author and Date stamp for the commit.
Note
A quick side mention that should be considered is that date stamps are not always chronologically ordered as we might think they should be, and this can happen in a number of ways, but most commonly, are going to be time zones of commits in a distributed development model or merges intermingling commits.
After the commit ID, the Author, and the Date stamp, is the commit message andthe diff. For those familiar with the diff
and patch
tools, they will feel right at home with this output formatting and its meanings. If this is new territory, fret not as the output is relatively straightforward: the lines with a +
character prepended are additions to the file, lines with a -
character prepended are removals from the file, lines without any prepended characters are not modified, and lines with the @@
characters are offsets in the file.
If the Git repository we were working with had not been initialized on our local machine, but instead had been cloned from a remote repository, which is what happens when you use OpenShift, there would be one more command needed to propagate this commit to the remote server: git push
. Do you remember we have mentioned before that Git is distributed, and therefore, the commit we made previously was only to our local repository? By performing a git push
, we are "pushing" those changes out to a remote location. The default remote location in Git nomenclature is known as origin
, but we need not supply that information to the command because by default it is assumed.
Note
Note that the following output is from an OpenShift Git repository and will contain some output that might not be very meaningful, but don't worry as this will be covered at length in the later sections.