Preparing the analysis environment
The first step of the analysis process is to prepare the data analysis environment. Large volumes of data require a large-scale analysis tool, and that tool is a database system. Relational databases are typically used for the analysis because of the ease and power of the SQL language, and those systems work well with data visualization tools and other software packages. Nonrelational databases can be used, but those are not preferred by most investigators.
Any relational database software that can handle large data volumes can be used. Commercial packages, such as SQL Server and Oracle, are the most common. Free packages, such as MySQL and PostgreSQL, can also be used. In this book, SQL Server is discussed because of its user-friendly interface and powerful features.
First, download and install SQL Server 2014 Express LocalDB and SQL Server 2014 Management Studio, which are available from http://www.microsoft.com/en-us/server-cloud/products/sql-server-editions...