What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Pentaho Data Integration 4 Cookbook

Chapter 2. Reading and Writing Files

In this chapter, we will cover:

Reading a simple file
Reading several files at the same time
Reading unstructured files
Reading files having one field by row
Reading files having some fields occupying two or more rows
Writing a simple file
Writing an unstructured file
Providing the name of a file (for reading or writing) dynamically
Using the name of a file (or part of it) as a field
Reading an Excel file
Getting the value of specific cells in an Excel file
Writing an Excel file with several sheets
Writing an Excel file with a dynamic number of sheets

Key benefits

Manipulate your data by exploring, transforming, validating, integrating, and more

Work with all kinds of data sources such as databases, plain files, and XML structures among others

Use Kettle in integration with other components of the Pentaho Business Intelligence Suite

Each recipe is a carefully organized sequence of instructions packed with screenshots, tables, and tips to complete the task as efficiently as possible

Description

Pentaho Data Integration (PDI, also called Kettle), one of the data integration tools leaders, is broadly used for all kind of data manipulation such as migrating data between applications or databases, exporting data from databases to flat files, data cleansing, and much more. Do you need quick solutions to the problems you face while using Kettle? Pentaho Data Integration 4 Cookbook explains Kettle features in detail through clear and practical recipes that you can quickly apply to your solutions. The recipes cover a broad range of topics including processing files, working with databases, understanding XML structures, integrating with Pentaho BI Suite, and more. Pentaho Data Integration 4 Cookbook shows you how to take advantage of all the aspects of Kettle through a set of practical recipes organized to find quick solutions to your needs. The initial chapters explain the details about working with databases, files, and XML structures. Then you will see different ways for searching data, executing and reusing jobs and transformations, and manipulating streams. Further, you will learn all the available options for integrating Kettle with other Pentaho tools. Pentaho Data Integration 4 Cookbook has plenty of recipes with easy step-by-step instructions to accomplish specific tasks. There are examples and code that are ready for adaptation to individual needs.

Who is this book for?

If you are a software developer or anyone involved or interested in developing ETL solutions, or in general, doing any kind of data manipulation, this book is for you. It does not cover PDI basics, SQL basics, or database concepts. You are expected to have a basic understanding of the PDI tool, SQL language, and databases.

What you will learn

Configure Kettle to connect to databases, explore them, and perform CRUD operations

Read, write, and parse simple and unstructured files

Solve common Excel needs such as reading from a particular cell or generating several sheets at a time

Read, validate, and generate simple and complex XML structures

Manipulate files by copying, deleting, compressing, or transferring to remote servers

Look up information from different sources such as databases, web services, or spreadsheets among others

Work with data flows performing operations such as joining, merging, or filtering rows

Customize the Kettle logs to your needs

Embed Java code in your transformations to gain performance and flexibility

Execute and reuse transformations and jobs in different ways

Integrate Kettle with Pentaho Reporting, Pentaho Dashboards, Community Data Access, and Pentaho BI Platform

What do you get with Print?

Instant access to your digital copy whilst your Print order is Shipped

Paperback book shipped to your preferred address

Redeem a companion digital copy on all Print orders

Access this title in our online reader with advanced features

DRM FREE - Read whenever, wherever and however you want

Frequently bought together

Pentaho Data Integration Beginner's Guide - Second Edition

€41.99

Pentaho 5.0 Reporting by Example: Beginner's Guide

€41.99

€37.99

Total € 121.97

Filter reviews by

All

Amazon verified reviews

Benaglia Nicola Jul 15, 2011

Pentaho Data Integration (PDI) has reached its 4th version with a lot of new interesting features and capabilities.This versatile tool is a must for all people working with data integration.Transformations and jobs are the target in PDI to realize a task including data reading, writing, manipulations and integrations, doing mathematical or logicaloperations, all this is tipical of a ETL tool (where ETL stands for Extract, Transform and Load).Do you need to move data from an excel file to a database, from a database to a text file?Do you need to extract data from a LDAP server, FTP, mail, log file, compressed file, web service or web site?All this must be done regularly, automatically?Would it be cool to be notified by email if the process failed?Sure you can do it in a lot of ways, but an ETL tool gives you the necessary help.In addition an open source ETL, like Pentaho Data Integration, has behind a strong and skilled community to help you.This book provides a lot of step-by-step examples (called "recipes") with a lot of practical, useful and very smart hints and strategies for developing transformations and jobs.New steps (a step a is basic task, for example reading from a file, sorting , grouping, calculating, ...) are very well described and explainedChapters of this book cover deeply all you need to know to understand the software and be ready to write your own transformations and be quickly productive.I found very useful the space dedicated to:- read and write file: unstructured and structured text files, excel and openoffice spreadsheets- XML files and validation with DTD and XSD Schemas- use fuzzy match step- reuse and flexibility of trasformations (name parameters, variable, mapping)- sending email with log log about the status of the execution- file management: retrieve file from server like FTP, copying, moving, deleting, comparing- integration of Kettle with Pentaho Suite (Pentaho Reporting Engine)The way all these subjects are explained is progressive and gradual. The use of targeted examples makes the reading very pleasant and easy.I suggest this book to you.

Amazon Verified review

donJaneiro Aug 10, 2011

This book is an excellent read (the product itself is not that bad either...). It would seem that you could use this book to supplement the one written by Matt Casters and co. If you are a newbie to ETL I suppose that this would be the preferred approach.The book is clearly categorized and the recipes are interrellated. Another refreshing aspect of the book is the fact that the authors are opinionated when necessary. This means that you can learn from their ideas on best practices regarding data-flows. If you have a background in eg. integration services and you know which data-flow pattern you want to implement, but perhaps don't know how to do it in PDI, then this book can be seen as a sort of conversion guide (and then some). The book is also peppered with links to great sites concerning the Pentaho toolset (especially PDI).

Data Aggregator Jul 07, 2011

PDI4_Cookbook is worth owning even if you have the 'other' two Kettle books on hand given the depth of that open source product. Easy to follow, it could be used as a first book on PDI once you get through the basic install/sample documents from the Pentaho.com site. It is well organized, up to date with PDI4 features and the recipes are for the most part truly useful in the ETL domain. The authors spend a sizable amount of space on the more obscure but useful Kettle facilities (e.g., sub-transformations or generating sample data, etc.) so this book will pay off in web searches for solutions. Recommended for all PDI developers!

Nelson Sousa Aug 02, 2011

This book a very good guide for new users, not only to Pentaho Data Integration but to ETL in general. All recipes are based on real world examples that one finds quite often and the explanations on how things work are a valuable resource for those that are giving the first steps on data integration and like to learn by example.But also experienced users can benefit from the recipes, given their wide range and applicability. I've worked with Maria Roldan for the past year and despite being an experienced Pentaho Data Integration user, I keep a copy of the book in my desk and browse through it every now and again because I'm facing a problem I remember seeing solved quickly and elegantly there.

Bill From Ann Arbor Nov 29, 2014

Incredibly informative and useful. This book probably saved me at least a 100 hours of ramp up time.

Pentaho Data Integration 4 Cookbook: Over 70 recipes to solve ETL problems using Pentaho Kettle

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Key benefits

Description

Who is this book for?

What you will learn

Product Details

What do you get with Print?

Contact Details

Shipping Address

Billing Address

Product Details

Packt Subscriptions

Frequently bought together

Table of Contents

Recommendations for you

Customer reviews

Filter reviews by

People who bought this also bought

FAQs