Using version control with GitHub
Version control is a process by which you can track and record changes made to a particular file or set of files so that when they are needed, you can get back specific versions later. Although versioning can be and is done on any kind of a file in a computer, for the purpose of this book, we will talk about the version that controls your scripts. Traditionally, every one of us does some or the other type of versions of our work by either creating multiple versions of it or copying it to another location. But this process is very error-prone. So, for a more reliable and error-free solution, special version control systems were devised. In today's world, especially, in software development projects, where many people work on the same project, the version control system for the purpose of keeping track of who does what is extremely important and these tools are a must for maintaining and managing the projects.
To solve the preceding problem, many centralized version control systems came into existence, of which CVS, Subversion, and Perforce are a few. These local, centralized version control systems were easy to maintain but they had the serious issue of putting every egg in a single basket. If for some reason, the centralized version control server breaks down during that time frame, no one would be able to collaborate and work. Also, if due to some problem, the hard disk crashes or data gets corrupted, then all the data is lost.
Due to the preceding problem, distributed version control systems came into existence. In a distributed version control system, the clients can not only check the snapshots of the files kept in the centralized server, but they can also maintain a full replica of those files in their local system. So, if something happens to the main server, a local copy is always maintained and the server is restored by simply copying the data from any of the local systems. Thus, in a distributed version control system, all the local clients act as a full backup system of the central data of the server. A few such tools are Git, Mercurial, Darcs, and so on.
Among all the distributed version control systems, Git is the most widely used because of the following advantages:
- The speed at which it functions
- It supports thousands of parallel branches of the same project
- It is a fully distributed architecture
- It can handle very large projects very efficiently
- It has a simplistic design
The following are a few of the differences between Git and other version control systems:
- In a traditional version control system, the difference between the files is saved as the difference between the two versions. Only the deltas are saved, whereas in Git, the versions are saved as snapshots. So, when the traditional tools treat the data as files and changes are made to the files over time, Git treats the data as a series of snapshots.
- Most of the operations in Git are local. You can make changes to the files, or if you want to check something in the history or want to get an old version, you no longer need to be connected to a remote server. Also, you don't need to be connected to make changes to the files; they will be locally saved, and you can push the changes back to the remote server at a later stage. This gives Git the speed to work.
- Whatever is saved in Git is saved with a checksum. So, making changes to the saved information in Git and expecting that Git will not know about it is nearly impossible.
- In Git, we generally always add data. So, it is very difficult to lose data in Git. In most cases, as the data is pushed to other repositories as well, we can always recover from any unexpected corruption.
This was a very short discussion on Git because GitHub is based entirely on Git. In this book, we will talk about GitHub and check how it is used because of the following reasons:
- Git is a command line tool and can be intimidating for a not-so-daily programmer. For a scripter, GitHub is a far better solution.
- Bringing your scripts to GitHub connects you to the social network of collaboration, and you can work with others in a more collaborative way.
So, without further ado, let's dive into GitHub. GitHub is the largest host for Git repositories. Millions of developers work on thousands of projects in GitHub. To use GitHub, the first thing you need to do is create an account in GitHub. We can create an account in GitHub by simply visiting https://github.com/ and signing up in the section provided for sign up. Note the e-mail ID that you used to create the account, as you will use the same e-mail ID to connect to this account from the local repository at a later stage.
One point to note is that once you log in to GitHub, you can create an SSH key pair to work with your local account and the GitHub repository. For security reasons, you should create a two-factor authentication for your account. To do so, perform the following steps:
- Log in to your account and go to Settings (top-right hand corner).
- On the left-hand side, under the Personal settings category, choose Security.
- Next, click on Set up two-factor authentication.
- Then, you can use an app or send an SMS.
So, you have created an account and set up two-factor authentication. Now, since we want to work on our local systems as well, we need to install it on the local system. So, go ahead and download the respective version for your system from http://git-scm.com/downloads.
Now, we can configure it two ways. Git is either included or can be installed as part of it in GitHub for Windows/Mac/Linux. GitHub uses a GUI tool. First, let's start with the command-line tool. For my examples I have used GitHub Desktop, which can be downloaded from https://desktop.github.com/.
Open the command-line tool and run the following commands to configure the environment:
git config --global user.name "Your Name" git config --global user.email "email@email.com"
You need to replace Your Name
with your name and email@email.com
with the e-mail with which you created your account in GitHub.
You can set the same using the GitHub tool as well. Once you install GitHub, go to Preferences and then Accounts. Log in with your account that you created on the GitHub site. This will connect you to your account in GitHub.
Next, go to the Advanced tab and fill in the details that you provided in the previous configuration under the Git Config section. Also, under the Command Line section, click on Install Command Line Tools. This will install the GitHub command-line utility on the system.
Okay, so now we have installed everything that we require, so let's go ahead and create our first repository.
To create a repository, log in to your account in GitHub, and then click on the +New Repository tab:
Next, provide a name for the repository, provide a description, and select whether you want to make it Private or Public. You can also select Initialize this repository with a README.
Once the preceding information is provided, click on Create Repository. This will create a new repository under your name and you would be the owner of the repository.
Before we go ahead and talk more about using GitHub, let's talk about a few concepts and how they work in GitHub.
There are two collaborative models in which GitHub works.
The fork & pull model
In this model, anyone can fork an existing repository and push changes to their personal fork. To do this, they do not need to have permissions granted to access the source repository. When the changes made to the personal repository are ready to be pushed to the original repository, the changes must be pulled back by the project owner. This reduces the initial collaboration required between team members and collaborators, who can work more independently. This is a popular model between open source collaborators.
Tip
Traditionally, fork is used to mean a deviation from the original project. But in the GitHub environment, a fork is simply a copy of the existing project that you can work and then merge back into the original project.
The shared repository model
In the shared repository model, everyone working on the project is granted push access to the original repository, and thus, anyone working on this project can update the original project. This is mainly used in small teams or private projects where organizations collaborate to work on a project.
Pull requests are more useful in the fork & pull model as they notify the project maintainer about the changes that have been made. They also initiate the code review and discussions on the changes made before they can be pushed back to the original project.
Branch
When you create a repository, it is, by default, the master
repository. So, how does another person work on the same project? They create a branch for themselves. A branch is a replica of the main repository. You can make all the changes to the branch, and when you are ready, you can merge your changes to the main
repository.
To summarize, a typical GitHub workflow is as follows:
- Create a branch of the
master
repository. - Make some changes of your own.
- Push this branch to your GitHub project.
- Open a pull request in GitHub.
- Discuss the changes, and if required, continue working on the changes.
- The project owner merges or closes the pull request.
Now, we can work on the preceding workflow from the command line using Git or use GUI from GitHub. The commands are as provided below (for those who prefer CLI to GUI):
git init
: This command initializes a directory as a new Git repository in your local system. Until this is done, there's no difference between a normal directory and a Git repository.git help
: This command will show you a list of commands available with Git.git status
: This command checks the status of the repository.git add
: This command adds files to the Git index.git commit
: This command asks Git to mark changes made to a repository. It will take a snapshot of the repository.git branch
: This command allows you to make a branch of an existing repository.git checkout
: This command allows you check the contents of the repository without going inside the repository.git merge
: As the name suggests, this command allows you to merge the changes that you made to the master.git push
: This particular command allows you to push the changes you made on your local computer back to the GitHub online repository so that other collaborators are able to see them.git pull
: If you are working on your local computer and want to bring the latest changes from the GitHub repository to your local computer, you can use this command to pull down the changes back to the local system.
Since we have already created an online repository named powercli_scripts
, let's create a local repository and sync them.
To create a local repository, all you need to do is create a local directory, and then from inside the directory, run the git init
command:
sdebnath:~ sdebnath$ mkdir git sdebnath:~ sdebnath$ cd git sdebnath\git$ git init Initialized empty Git repository in /Users/sdebnath/git/.git/.
To use the GUI tool open the GitHub application, and then from the File menu, select Add Local Repository.
This will bring up a pop-up window saying that This folder is not a repository and asking if you want to create and add the repository. Click on Create and Add. This will create a local repository for you.
Now, let's go to the directory and create a file and put some text into it. Once the file is created, we will check the status of the repository that will tell us that there are untracked files in the repository. Once done, we will notify Git that there is a file that has changed. Then, we will commit the change to Git so that Git can take its snapshot. Here is a list of commands:
$ cd Git $ touch README.txt $ echo "Hello there, first document in the repository" > README.txt $ git status $ git add README.txt $ git commit -m "README.txt added"
The following is a screenshot of the above commands and the output that we get for a successful run.
Now, we need to add the remote repository. We do this by running the following command:
$ git remote add origin https://github.com/yourname/repository.git
Replace yourname
with your username and repository
with your repository name.
You can do the same work through the GitHub's GUI application as well. Once you open the application, go to Preferences, and then under Accounts, log in to your GitHub account with your account details. Once done, you can create a branch of the repository.
Let's create a my-changes
branch from master
. Click on the Branch icon next to master
(as shown in the following screenshot):
Once you do this, your working branch changes to my-changes
. Now, add a file to your local repository, say, changes.txt
, and add some text to it:
$ touch changes.txt $ echo "Changes I made" > changes.txt
The changes that you made will immediately be visible in the GitHub application. You commit the changes made to the my-changes
repository.
In the Repository option, select the Push option to push the changes to GitHub.
Next, I have added another file and again committed to my-changes
.
This will keep the status of the local and remote repositories as Unsynced. Click on the right-hand side Sync button to sync the repositories.
Now, if I go back to the GitHub site, I can see the changes that I made to the my-changes
branch.
Next, I want to merge the branch into the main branch. I create a pull request. I can do this directly from the GitHub online page or from the GitHub application. In the GitHub application on the local system, go to Repository, and then click on Create Pull Request. Provide it a name and description, and click on Create. This will create a pull request in the main branch.
Now, go back to the GitHub page, and you will be able to see the details of the pull request.
Click on Merge pull request, provide your comment, and click on Confirm merge to merge the change. Now, you can click on Delete the branch.
Also, if you go back to your main branch (which is powercli_scripts
in my case), you will be able to see the changes in the main branch.
This concludes this section. Now, you should be able to create your own project or fork and work on existing projects.