Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Free Learning
Arrow right icon
Arrow up icon
GO TO TOP
Pentaho Analytics for MongoDB Cookbook

You're reading from   Pentaho Analytics for MongoDB Cookbook Over 50 recipes to learn how to use Pentaho Analytics and MongoDB to create powerful analysis and reporting solutions

Arrow left icon
Product type Paperback
Published in Dec 2015
Publisher
ISBN-13 9781783553273
Length 218 pages
Edition 1st Edition
Languages
Tools
Arrow right icon
Authors (2):
Arrow left icon
Harris Ward Harris Ward
Author Profile Icon Harris Ward
Harris Ward
Joel Andre Latino Joel Andre Latino
Author Profile Icon Joel Andre Latino
Joel Andre Latino
Arrow right icon
View More author details
Toc

Table of Contents (10) Chapters Close

Preface 1. PDI and MongoDB FREE CHAPTER 2. The Thin Kettle JDBC Driver 3. Pentaho Instaview 4. A MongoDB OLAP Schema 5. Pentaho Reporting 6. The Pentaho BI Server 7. Pentaho Dashboards 8. Pentaho Community Contributions Index

Exporting MongoDB data using the aggregation framework

In this recipe, we will explore the use of the MongoDB aggregation framework in the MongoDB Input Step. We will create a simple example to get data from a collection and show you how you can take advantage of the MongoDB aggregation framework to prepare data for the PDI stream.

Getting ready

To get ready for this recipe, you will need to start your ETL development environment Spoon, and make sure that you have the MongoDB server running with the data from the previous recipe.

How to do it…

The following steps introduce the use of the MongoDB aggregation framework:

  1. Create a new empty transformation.
    1. Set the transformation to PDI using MongoDB Aggregation Framework.
    2. Set the name for this transformation to chapter1-using-mongodb-aggregation-framework.
  2. Select data from the Orders collection using the MongoDB Input step.
    1. Select the Design tab in the left-hand-side view.
    2. From the Big Data category folder, find the MongoDB Input step and drag and drop it into the working area in the right-hand-side view.
    3. Double-click on the step to open the MongoDB Input dialog.
    4. Set the step name to Select 'Baane Mini Imports' Orders.
    5. Select the Input options tab. Click on the Get DBs button and select the SteelWheels option for the Database field. Next, click on Get collections and select the Orders option for the Collection field.
    6. Select the Query tab and then check the Query is aggregation pipeline option. In the text area, write the following aggregation query:
      [ 
       { $match: {"customer.name" : "Baane Mini Imports"} },
       { $group: {"_id" : {"orderNumber": "$orderNumber", 
       "orderDate" : "$orderDate"}, "totalSpend": { $sum: 
       "$totalPrice"} } } 
      ]
    7. Uncheck the Output single JSON field option.
    8. Select the Fields tab. Click on the Get Fields button and you will get a list of fields returned by the query. You can preview your data by clicking on the Preview button.
    9. Click on the OK button to finish the configuration of this step.
  3. We want to add a Dummy step to the stream. This step does nothing, but it will allow us to select a step to preview our data. Add the Dummy step from the Flow category to the workspace and name it OUTPUT.
  4. Create a hop between the Select 'Baane Mini Imports' Orders step and the OUTPUT step.
  5. Select the OUTPUT dummy step and preview the data.

How it works…

The MongoDB aggregation framework allows you to define a sequence of operations or stages that is executed in pipeline much like the Unix command-line pipeline. You can manipulate your collection data using operations such as filtering, grouping, and sorting before the data even enters the PDI stream.

In this case, we are using the MongoDB Input step to execute an aggregation framework query. Technically, this does the same as db.collection.aggregate(). The query that we execute is broken down into two parts. For the first part, we filter the data based on a customer name. In this case, it is Baane Mini Imports. For the second part, we group the data by order number and order date and sum the total price.

See also

In the next recipe, we will talk about other ways in which you can aggregate data using MongoDB Map/Reduce.

You have been reading a chapter from
Pentaho Analytics for MongoDB Cookbook
Published in: Dec 2015
Publisher:
ISBN-13: 9781783553273
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime
Banner background image