Packt+ | Advance your knowledge in tech

You're reading from Learning Pentaho Data Integration 8 CE An end-to-end guide to exploring, transforming, and integrating your data across multiple sources

Product type Paperback

Published in Dec 2017

Publisher Packt

ISBN-13 9781788292436

Length 500 pages

Edition 3rd Edition

Languages

Java

Tools

Pentaho

Concepts

Data Processing

Author (1):

María Carina Roldán

View More author details

Table of Contents (17) Chapters

Preface

1. Getting Started with Pentaho Data Integration FREE CHAPTER

2. Getting Started with Transformations

3. Creating Basic Task Flows

4. Reading and Writing Files

5. Manipulating PDI Data and Metadata

6. Controlling the Flow of Data

7. Cleansing, Validating, and Fixing Data

8. Manipulating Data by Coding

9. Transforming the Dataset

10. Performing Basic Operations with Databases

11. Loading Data Marts with PDI

12. Creating Portable and Reusable Transformations

13. Implementing Metadata Injection

14. Creating Advanced Jobs

15. Launching Transformations and Jobs from the Command Line

16. Best Practices for Designing and Deploying a PDI Project

Converting rows to columns

In most datasets, each row belongs to a different element such as a different sale or a different customer. However, there are datasets where a single row doesn't completely describe one element. Take, for example, the file from Chapter 8, Manipulating Data by Coding, containing information about houses. Every house was described through several rows. A single row gave incomplete information about the house. The ideal situation would be one in which all the attributes for the house were in a single row. With PDI, you can convert the data to this alternative format.

Converting row data to column data using the Row denormaliser step

The Row denormaliserstep converts the incoming dataset to a new dataset by moving information from rows to columns according to the values of a key field.

To understand how the Row denormaliser works, let's introduce an example. We will work with a file containing a list of French movies of all times. This is how it looks: