Pentaho Data Integration and Pentaho BI Suite
Before introducing PDI, let’s talk about Pentaho BI Suite. The Pentaho Business Intelligence Suite is a collection of software applications intended to create and deliver solutions for decision making. The main functional areas covered by the suite are:
- Analysis: The analysis engine serves multidimensional analysis. It’s provided by the Mondrian OLAP server.
- Reporting: The reporting engine allows designing, creating, and distributing reports in various known formats (HTML, PDF, and so on), from different kinds of sources.
- Data Mining: Data mining is used for running data through algorithms in order to understand the business and do predictive analysis. Data mining is possible thanks to the Weka Project.
- Dashboards: Dashboards are used to monitor and analyze Key Performance Indicators (KPIs). The Community Dashboard Framework (CDF), a plugin developed by the community and integrated in the Pentaho BI Suite, allows the creation of interesting dashboards including charts, reports, analysis views, and other Pentaho content, without much effort.
- Data Integration: Data integration is used to integrate scattered information from different sources (applications, databases, files, and so on), and make the integrated information available to the final user. Pentaho Data Integration—the tool that we will learn to use throughout the book—is the engine that provides this functionality.
All of this functionality can be used standalone but also integrated. In order to run analysis, reports, and so on, integrated as a suite, you have to use the Pentaho BI Platform. The platform has a solution engine, and offers critical services, for example, authentication, scheduling, security, and web services.
This set of software and services form a complete BI Platform, which makes Pentaho Suite the world’s leading open source Business Intelligence Suite.