(For more resources related to this topic, see here.)
Massively Parallel Processing (MPP) databases are those which partition (and optionally replicate) data into multiple nodes. All meta-information regarding data distribution is stored in master nodes. When a query is issued, it is parsed and a suitable query plan is developed as per the meta-information and executed on relevant nodes (nodes that store related user data). HP offers one such MPP database called Vertica to solve pertinent issues of Big Data analytics.
Vertica differentiates itself from other MPP databases in many ways. The following are some of the key points:
This article will guide you through the installation and creation of a Vertica cluster. This article will also cover the installation of Vertica Management Control, which is shipped with the Vertica Enterprise edition only. It should be noted that it is possible to upgrade Vertica to a higher version but vice versa is not possible.
Before installing Vertica, you should bear in mind the following points:
Vertica has various preinstallation steps that are needed to be performed for the smooth running of Vertica. Some of the important ones are covered here.
Swap space is the space on the physical disk that is used when primary memory (RAM) is full. Although swap space is used in sync with RAM, it is not a replacement for RAM. It is suggested to have 2 GB of swap space available for Vertica. Additionally, Vertica performs well when swap-space-related files and Vertica data files are configured to store on different physical disks.
Dynamic CPU frequency scaling, or CPU throttling, is where the system automatically adjusts the frequency of the microprocessor dynamically. The clear advantage of this technique is that it conserves energy and reduces the heat generated. It is believed that CPU frequency scaling reduces the number of instructions a processor can issue. Additional theories state that when frequency scaling is enabled, the CPU doesn't come to full throttle promptly. Hence, it is best that dynamic CPU frequency scaling is disabled. CPU frequency scaling can be disabled from Basic Input/Output System (BIOS). Please note that different hardware might have different settings to disable CPU frequency scaling.
It is suggested to keep a buffer of 20-30 percent of disk space per node. Vertica uses buffer space to store temporary data, which is data coming from the merge out operations, hash joins, and sorts, and data arising from managing nodes in the cluster.
Installing Vertica is fairly simple. With the following steps, we will try to understand a two-node cluster:
rpm -Uvh vertica-x.x.x-x.x.rpm
dpkg -i vertica-x.x.x-x.x.deb
Refer to the following screenshot for more details:
Running the Vertica package
/opt/vertica/sbin/install_vertica -s host_list -r rpm_package -u dba_username
Here –s is the hostname/IP of all the nodes of the cluster including the one on which Vertica is already installed. –r is the path of Vertica package and –u is the username that we wish to create for working on Vertica. This user has sudo privileges. If prompted, provide a password for the new user. If we do not specify any username, then Vertica creates dbadmin as the user, as shown in the following example:
[impetus@centos64a setups]$ sudo /opt/vertica/sbin/install_vertica -s
192.168.56.101,192.168.56.101,192.168.56.102 -r
"/ilabs/setups/vertica-6.1.3-0.x86_64.RHEL5.rpm" -u dbadmin Vertica Analytic Database 6.1.3-0 Installation Tool Upgrading admintools meta data format.. scanning /opt/vertica/config/users Starting installation tasks... Getting system information for cluster (this may take a while).... Enter password for impetus@192.168.56.102 (2 attempts left): backing up admintools.conf on 192.168.56.101 Default shell on nodes: 192.168.56.101 /bin/bash 192.168.56.102 /bin/bash Installing rpm on 1 hosts.... installing node.... 192.168.56.102 NTP service not synchronized on the hosts: ['192.168.56.101', '192.168.56.102'] Check your NTP configuration for valid NTP servers. Vertica recommends that you keep the system clock synchronized using NTP or
some other time synchronization mechanism to keep all hosts synchronized.
Time variances can cause (inconsistent) query results when
using Date/Time Functions. For instructions, see: * http://kbase.redhat.com/faq/FAQ_43_755.shtm * http://kbase.redhat.com/faq/FAQ_43_2790.shtm Info: the package 'pstack' is useful during troubleshooting.
Vertica recommends this package is installed. Checking/fixing OS parameters..... Setting vm.min_free_kbytes to 37872 ... Info! The maximum number of open file descriptors is less than 65536 Setting open filehandle limit to 65536 ... Info! The session setting of pam_limits.so is not set in /etc/pam.d/su Setting session of pam_limits.so in /etc/pam.d/su ... Detected cpufreq module loaded on 192.168.56.101 Detected cpufreq module loaded on 192.168.56.102 CPU frequency scaling is enabled. This may adversely affect the performance
of your database. Vertica recommends that cpu frequency scaling be turned off or set
to 'performance' Creating/Checking Vertica DBA group Creating/Checking Vertica DBA user Password for dbadmin: Installing/Repairing SSH keys for dbadmin Creating Vertica Data Directory... Testing N-way network test. (this may take a while) All hosts are available ... Verifying system requirements on cluster. IP configuration ... IP configuration ... Testing hosts (1 of 2).... Running Consistency Tests LANG and TZ environment variables ... Running Network Connectivity and Throughput Tests... Waiting for 1 of 2 sites... ... Test of host 192.168.56.101 (ok) ==================================== Enough RAM per CPUs (ok) -------------------------------- Test of host 192.168.56.102 (ok) ==================================== Enough RAM per CPUs (FAILED) -------------------------------- Vertica requires at least 1 GB per CPU (you have 0.71 GB/CPU) See the Vertica Installation Guide for more information. Consistency Test (ok) ========================= Info: The $TZ environment variable is not set on 192.168.56.101 Info: The $TZ environment variable is not set on 192.168.56.102 Updating spread configuration... Verifying spread configuration on whole cluster. Creating node node0001 definition for host 192.168.56.101 ... Done Creating node node0002 definition for host 192.168.56.102 ... Done Error Monitor 0 errors 4 warnings Installation completed with warnings. Installation complete. To create a database: 1. Logout and login as dbadmin.** 2. Run /opt/vertica/bin/adminTools as dbadmin 3. Select Create Database from the Configuration Menu ** The installation modified the group privileges for dbadmin. If you used sudo to install vertica as dbadmin, you will need to logout and login again before the privileges are applied.
/opt/vertica/bin/adminTools
License key prompt
Prompt for EULA
After reviewing and accepting the EULA, you will be presented with the main menu of the admin tools of Vertica.
Admin Tools Main Menu
Create database option in the Configuration menu
Name and Comment of the Database
Password for the New database
Catalog and Data Pathname
After providing all the necessary information related to the database, you will be asked to select hosts on which the database needs to be deployed. Once all the desired hosts are selected, Vertica will ask for one final check.
Final confirmation for a database creation
Database Creation
As you can see and understand this article explains briefly about, Vertica installation. One can check further by creating sample tables and performing basic CRUD operations.
For a clean installation, it is recommended to serve all the minimum requirements of Vertica. It should be noted that installation of client API(s) and Vertica Management console needs to be done separately and is not included in the basic package.
Further resources on this subject: