Improving application performance using parallel techniques
In Chapter 11, Mathematical and Parallel Techniques for Data Analysis, we consider some of the parallel techniques available for data science applications. Concurrent execution of a program can significantly improve performance. In relation to data science, these techniques range from low-level mathematical calculations to higher-level API-specific options.
This chapter includes a discussion of basic performance enhancement considerations. Algorithms and application architecture matter as much as enhanced code, and this should be considered when attempting to integrate parallel techniques. If an application does not behave in the expected or desired manner, any gains from parallel optimizing are irrelevant.
Matrix operations are essential to many data applications and supporting APIs. We will include a discussion in this chapter about matrix multiplication and how it is handled using a variety of approaches. Even though these operations are often hidden within the API, it can be useful to understand how they are supported.
One approach we demonstrate utilizes the Apache Commons Math API (http://commons.apache.org/proper/commons-math/). This API supports a large number of mathematical and statistical operations, including matrix multiplication. The following example illustrates how to perform matrix multiplication.
We first declare and initialize matrices A
and B
:
double[][] A = { {0.1950, 0.0311}, {0.3588, 0.2203}, {0.1716, 0.5931}, {0.2105, 0.3242}}; double[][] B = { {0.0502, 0.9823, 0.9472}, {0.5732, 0.2694, 0.916}};
Apache Commons uses the RealMatrix
class to store a matrix. Next, we use the Array2DRowRealMatrix
constructor to create the corresponding matrices for A
and B
:
RealMatrix aRealMatrix = new Array2DRowRealMatrix(A); RealMatrix bRealMatrix = new Array2DRowRealMatrix(B);
We perform multiplication simply using the multiply
method:
RealMatrix cRealMatrix = aRealMatrix.multiply(bRealMatrix);
Finally, we use a for
loop to display the results:
for (int i = 0; i < cRealMatrix.getRowDimension(); i++) { System.out.println(cRealMatrix.getRowVector(i)); }
The output is as follows:
{0.02761552; 0.19992684; 0.2131916} {0.14428772; 0.41179806; 0.54165016} {0.34857924; 0.32834382; 0.70581912} {0.19639854; 0.29411363; 0.4963528}
Another approach to concurrent processing involves the use of Java threads. Threads are used by APIs such as Aparapi when multiple CPUs or GPUs are not available.
Data science applications often take advantage of the map-reduce algorithm. We will demonstrate parallel processing by using Apache's Hadoop to perform map-reduce. Designed specifically for large datasets, Hadoop reduces processing time for large scale data science projects. We demonstrate a technique for calculating the average value of a large dataset.
We also include examples of APIs that support multiple processors, including CUDA and OpenCL. CUDA is supported using Java bindings for CUDA (JCuda) (http://jcuda.org/). We also discuss OpenCL and its Java support. The Aparapi API provides high-level support for using multiple CPUs or GPUs and we include a demonstration of Aparapi in support of matrix multiplication.