Search icon CANCEL
Subscription
0
Cart icon
Your Cart (0 item)
Close icon
You have no products in your basket yet
Arrow left icon
Explore Products
Best Sellers
New Releases
Books
Videos
Audiobooks
Learning Hub
Newsletter Hub
Free Learning
Arrow right icon
timer SALE ENDS IN
0 Days
:
00 Hours
:
00 Minutes
:
00 Seconds
Arrow up icon
GO TO TOP
Pentaho Data Integration 4 Cookbook

You're reading from   Pentaho Data Integration 4 Cookbook Over 70 recipes to solve ETL problems using Pentaho Kettle

Arrow left icon
Product type Paperback
Published in Jun 2011
Publisher Packt
ISBN-13 9781849515245
Length 352 pages
Edition 1st Edition
Tools
Arrow right icon
Toc

Table of Contents (17) Chapters Close

Pentaho Data Integration 4 Cookbook
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
1. Working with Databases FREE CHAPTER 2. Reading and Writing Files 3. Manipulating XML Structures 4. File Management 5. Looking for Data 6. Understanding Data Flows 7. Executing and Reusing Jobs and Transformations 8. Integrating Kettle and the Pentaho Suite 9. Getting the Most Out of Kettle Data Structures Index

Index

A

  • action sequence
    • about / Executing a PDI transformation as part of a Pentaho process, How it works...
  • Add sequence step
    • avoiding / Avoiding using an Add sequence step to enumerate the rows
  • Ad Hoc Reporting user interface
    • about / There's more...
  • alternative notation
    • for separator / Alternative notation for a separator
  • ALTER TABLE statement / There's more...
  • arguments
    • values, supplying for / Supplying values for named parameters, variables and arguments
  • Argument tab / How it works...
  • attached files
    • e-mails, sending with / Sending e-mails with attached files, How to do it..., How it works...
  • authors.txt file / Getting ready

B

  • Books data structure / Authors
  • Business Intelligence solutions / Introduction

C

  • CAG
    • about / Generating all possible pairs formed from two datasets
  • Calculator step / How it works...
  • Cartesian product
    • performing, between dataset / Generating all possible pairs formed from two datasets, How to do it..., How it works...
  • CDA
    • about / There's more...
  • CDA DataAccess
    • about / How it works...
  • CDA plugin
    • files, generating with / Generating files from the PUC with PDI and the CDA plugin, How to do it..., How it works...
    • about / Generating files from the PUC with PDI and the CDA plugin
  • CDA previewer
    • about / How to do it...
  • CDF
    • about / Limiting the number of output rows, Populating a CDF dashboard with data coming from a PDI transformation, There's more...
  • CDF dashboard
    • about / Populating a CDF dashboard with data coming from a PDI transformation
    • populating, with data from PDI transformation / Getting ready, How to do it..., How it works...
    • visual elements, adding / How to do it...
  • cell
    • searching, in Excel file / Looking for a given cell
  • cells value
    • retrieving, from Excel file / Getting the value of specific cells in an Excel file, How to do it..., How it works...
  • changed flag / How it works...
  • Check Db connection job entry / Checking the database connection at run-time
  • clean log file
    • creating / Creating a clean log file
  • code
    • generalizing, parameters used / Generalizing you code
  • Combination lookup/update step / Using the Combination lookup/update for looking up
  • commercial databases / Connecting to a database not supported by Kettle
  • Community Chart Framework (CCF) / How it works...
  • Community Dashboard Editor (CDE) / Getting ready
  • compare stream / How it works...
  • complex conditions, issues
    • overcoming / Overcoming the difficulties of complex conditions
  • complex XML structures
    • generating / Generating complex XML structures, Getting ready, How to do it..., How it works...
  • compressed files
    • about / Working with ZIP files
  • connection
    • creating, to database / Connecting to a database, How to do it..., How it works...
  • connection settings, for database / Getting ready
  • copied row
    • accessing, from transformations / Accessing the copied rows from jobs, transformations, and other entries
    • accessing, from jobs / Accessing the copied rows from jobs, transformations, and other entries
  • copy/get rows mechanism
    • about / How it works..., Serializing/De-serializing data
  • Copy Files job / Copying or moving one or more files, How it works...
  • Copy rows to result step / There's more..., Executing a transformation or part of a job once for every file in a list of files
  • countries.xml file / Getting ready
  • CREATE INDEX statement / There's more...
  • CREATE TABLE statement / Creating or altering a database table from PDI (design time), There's more...
  • cron
    • about / There's more...
  • CSV
    • about / Reading files with fixed width fields, How it works...
  • CSV file
    • about / Getting ready
  • currentSpecialChar flag / How it works...
  • current_conditions step
    • preview / A sample transformation
  • current_conditions_normalized step
    • preview / A sample transformation
  • custom list of files
    • copying / Copying or moving a custom list of files, How to do it..., How it works...
    • moving / Copying or moving a custom list of files, How to do it..., How it works...
    • deleting / Deleting a custom list of files , How to do it..., How it works..., See also
  • custom log file
    • generating / Generating a custom log file, How to do it..., How it works...

D

  • Damerau-Levenshtein algorithm / There's more...
  • dashboard
    • about / Populating a CDF dashboard with data coming from a PDI transformation
  • data
    • retrieving, from database / Getting data from a database, How it works..., There's more...
    • retrieving, with parameters from database / Getting data from a database by providing parameters, How it works..., Executing the SELECT statement several times, each for a different set of parameters
    • retrieving, with query / Getting data from a database by running a query built at runtime, How it works...
    • deleting, from table / Deleting data from a table, Getting ready, How to do it..., How it works...
    • retrieving, from different path / Getting data from a different path
    • retrieving, selectively / Getting data selectively
    • searching, in database table / Looking for values in a database table, How to do it..., How it works...
    • searching, in database / Getting ready, How it works...
    • searching, in database with extreme flexibility / How to do it..., How it works...
    • searching, in resources / Looking for values in a variety of sources, How to do it..., How it works...
    • searching, by proximity / Looking for values by proximity, How to do it..., How it works...
    • searching, by consuming web services / Looking for values consuming a web service, How to do it..., How it works...
    • searching, over intranet / Looking for values over an intranet or Internet, How to do it..., How it works...
    • searching, over internet / Looking for values over an intranet or Internet, How to do it..., How it works...
    • de-serializing / Serializing/De-serializing data
    • serializing / Serializing/De-serializing data
    • generating, Data grid step used / Using Data grid step to generate specific data
  • dataAccessId parameter / How it works...
  • database
    • about / Introduction
    • connecting to / Connecting to a database, How to do it..., How it works...
    • connection, creating to / Connecting to a database, How to do it..., How it works...
    • same database connection, avoiding / Avoiding creating the same database connection over and over again
    • advanced connection properties, specifying / Specifying advanced connection properties
    • data, retrieving from / Getting data from a database, How it works..., There's more...
    • creating, PDI used / Creating or altering a database table from PDI (design time), How to do it..., How it works...
    • altering, PDI used / Creating or altering a database table from PDI (design time), How to do it..., How it works...
    • creating, from PDI / Creating or altering a database table from PDI (runtime), How to do it..., There's more...
    • data, searching in / Getting ready, How it works...
  • database connection
    • about / How it works...
    • verifying, on runtime / Checking the database connection at run-time
    • modifying, at runtime / Changing the database connection at runtime, How to do it..., How it works...
  • Database connections option / How to do it...
  • Database Explorer window / There's more...
  • Database join step / How it works..., There's more..., Looking for values in a database with extreme flexibility
    • advantages / There's more...
  • Database lookup step / How it works..., Looking for values in a database with extreme flexibility
  • database storage methods / Introduction
  • database table
    • data, searching in / Looking for values in a database table, How to do it..., How it works...
  • Data grid step
    • data, generating with / Using Data grid step to generate specific data
    • about / Using Data grid step to generate specific data
    • history / Using Data grid step to generate specific data
  • data lookup
    • issues / Taking some action when the lookup fails, Taking some action when there are too many results, Looking for non-existent data
  • dataset
    • about / Introduction
    • Cartesian product, performing between / Generating all possible pairs formed from two datasets, How to do it..., How it works...
  • data source
    • defining, for report / How to do it...
  • data types / About data types and formats
  • data types equivalence / Data type's equivalence
  • data warehouses
    • about / Introduction
  • DBMS
    • about / Sample databases
  • Default target step / Avoiding the use of nested Filter Rows steps
  • deleted files
    • figuring out / Figuring out which files have been deleted
  • deleted flag / How it works...
  • Delete file job / How it works...
  • Delete File job / Figuring out which files have been deleted
  • Delete filenames from result job entry / How it works...
  • delete operation / Insert, update, and delete all-in-one
  • DELETE operation / How it works...
  • delta_value sequence / How it works...
  • desc_product fieldname / Changing headers
  • detail files / Master/detail files
  • Detect empty stream step / Executing steps even when your stream is empty, There's more...
  • dialog
    • exploring, for UDJC step / How it works...
  • dimension table
    • about / Creating or altering a database table from PDI (design time)
  • disk-storage based databases / Connecting to a database not supported by Kettle
  • doQuery feature
    • about / How it works...
  • Double Metaphone algorithm / There's more...
  • DTD
    • limitations / There's more...
  • DTD definitions
    • about / Validating an XML file against DTD definitions
    • XML file, validating against / Validating an XML file against DTD definitions, Getting ready, How it works...
  • Dummy steps
    • avoiding / Avoiding the use of Dummy steps
  • Dynamic SQL row step / Looking for values in a database with extreme flexibility, How it works..., There's more...

E

  • e-mails
    • benefits / Sending e-mails with attached files
    • sending, with attached files / Sending e-mails with attached files, How to do it..., How it works...
    • logs, sending through / Sending logs through an e-mail
    • sending, in transformation / Sending e-mails in a transformation
  • E4X
    • about / ECMAScript for XML
    • URL / ECMAScript for XML
  • email job entry / Sending e-mails with attached files
  • Enable safe mode option / Telling Kettle how to merge the rows of your streams
  • encoding / About file format and encoding
  • Excel file
    • reading / Getting ready, How it works...
    • cells value, retrieving in / Getting the value of specific cells in an Excel file, How to do it..., How it works...
    • labels, horizontally arranged / Labels and values horizontally arranged
    • cell, searching / Looking for a given cell
    • writing, with multiple sheets / Writing an Excel file with several sheets, How to do it..., How it works...
    • writing, with dynamic sheets / Writing an Excel file with a dynamic number of sheets, How to do it..., How it works...
    • about / Getting ready
  • Excel Writer plugin step / Sending e-mails in a transformation
  • Execute a transformation window / Sample transformations
  • Execute for every input row? option / There's more...

F

  • field
    • filename, using as / Using the name of a file (or part of it) as a field, How to do it..., How it works...
    • comparing, against constant value / Comparing against the value of a Kettle variable
    • comparing, against another field / Comparing against the value of a Kettle variable
  • fields
    • specifying, XPath notation used / Specifying fields by using XPath notation, Getting ready, How it works...
    • generating, with XML structures / Generating fields with XML structures
  • file-based jobs
    • information, retrieving of / Getting information about transformations and jobs (file-based), How to do it..., How it works...
  • file-based transformations
    • information, retrieving of / Getting information about transformations and jobs (file-based), How to do it..., How it works...
  • File Compare job entry / How it works...
  • file existence
    • detecting / Detecting the existence of the files before copying them
  • file format / About file format and encoding
  • file list transformation / Sample transformation: File list
  • File management category / Comparing folders
  • File Management category / There's more...
  • filename
    • using, as field / Using the name of a file (or part of it) as a field, How to do it..., How it works...
  • files
    • about / Introduction
    • simple file, reading / Reading a simple file, How to do it..., How it works...
    • reading, with fixed width fields / Reading files with fixed width fields
    • multiple files, reading at once / Reading several files at the same time, How to do it..., How it works...
    • unstructured files, reading / Reading unstructured files, Getting ready, How to do it..., How it works...
    • reading, with one field by row / Reading files having one field by row, Getting ready, How to do it..., There's more...
    • simple file, writing / Getting ready, How it works...
    • unstructured file, writing / Writing an unstructured file, How to do it..., How it works...
    • name, providing to / Providing the name of a file (for reading or writing) dynamically , How to do it..., How it works...
    • Excel file, reading / Getting ready, How it works...
    • simple XML files, reading / Reading simple XML files, How to do it..., How it works...
    • moving / Getting ready, How it works..., Moving files
    • copying / Getting ready, How it works...
    • existence, detecting for / Detecting the existence of the files before copying them
    • deleting / Getting ready, How it works...
    • deleted file, figuring out / Figuring out which files have been deleted
    • retrieving, from remote server / How to do it..., How it works...
    • specifying, to transfer / Specifying files to transfer
    • accessing, with SFTP / Access via SFTP
    • accessing, with FTPS / Access via FTPS
    • placing, on remote server / Putting files on a remote server, How to do it..., How it works...
    • comparing, with folders / Comparing files and folders, How to do it...
    • creating / Executing a job or a transformation by setting static arguments and parameters
    • generating, from PUC with CDA plugin / Generating files from the PUC with PDI and the CDA plugin, How to do it..., How it works...
    • generating, from PUC with PDI / Generating files from the PUC with PDI and the CDA plugin, How to do it..., How it works...
  • files, reading
    • with fields, occupying two or more rows / Reading files with some fields occupying two or more rows, How to do it..., How it works...
  • files, unzipping
    • avoiding / Avoiding unzipping files
  • files, zipping
    • avoiding / Avoiding zipping files
  • filesToAttach folder / How to do it...
  • file transfer
    • information, retrieving for / Getting information about the files being transferred
  • Filter rows step / Splitting a stream into two or more streams based on a condition, How it works..., Avoiding the use of Dummy steps, Avoiding the use of nested Filter Rows steps
  • findInfoRowSet() method / Looking up information with additional steps
  • fixed width fields
    • files, reading with / Reading files with fixed width fields
  • Flow category / Introduction
  • folders
    • creating / Creating folders
    • files, comapring with / Comparing files and folders, How to do it...
    • comparing / Comparing folders
  • Foodmart / Sample databases
  • format
    • providing, to output fields / Giving the output fields a format
  • FTP
    • about / Getting files from a remote server
  • FTPS
    • files, accessing with / Access via FTPS
    • about / Access via FTPS
  • FTP server
    • connections considerations / Some considerations about connecting to an FTP server
  • FULL OUTER join / There's more...
  • fuzzy match algorithm
    • about / How it works...
  • Fuzzy match step / How to do it...
    • algorithms / There's more...

G

  • Generate random value step
    • about / There's more...
  • Generate rows step / How it works...
  • GENRE parameter / How it works...
  • get() method / How it works...
  • getRow() function / How it works...
  • getRowFrom() method / Looking up information with additional steps
  • Get rows from result step / How it works...
  • Get SQL select statement... button / There's more...
  • Get System Info step
    • about / Get System Info
  • Get Variable step / Getting variables in the middle of the stream
  • Get XML Data step / Reading simple XML files, How it works...
  • Group by step / How it works...

H

  • H2
    • about / Speeding up your transformation
  • headers
    • modifying / Changing headers
  • hello transformation / Sample transformation: Hello
  • hexadecimal notation / Alternative notation for a separator
  • hibernate database / Pentaho BI platform databases
  • HMAC / There's more...
  • HOST_NAME variable / Avoiding modifying jobs and transformations every time a connection changes
  • HSQLDB
    • about / Speeding up your transformation
  • HTML
    • about / Introduction, Creating a Pentaho report with data coming from PDI
  • HTML page
    • generating, XSL transformation used / How to do it..., How it works...
    • generating, XML transformation used / How to do it..., How it works...
  • HTTP Client step / How it works...
  • Hypersonic (HSQLDB)
    • about / Pentaho BI platform databases

I

  • ${Internal.Transformation.Filename.Directory} variable / How to do it...
  • identical flag / How it works...
  • identifiers / How it works...
  • id_author field / How to do it...
  • in-memory databases / Connecting to a database not supported by Kettle
  • INCREMENT parameter / How it works...
  • Infobright / Connecting to a database not supported by Kettle
  • information
    • retrieving, for file transfer / Getting information about the files being transferred
    • retrieving, about file-based tranformations / Getting information about transformations and jobs (file-based), How to do it..., How it works...
    • retrieving, about file-based jobs / Getting information about transformations and jobs (file-based), How to do it..., How it works...
    • retrieving, about repository-based jobs / Getting information about transformations and jobs (repository-based), How it works...
    • retrieving, about repository-based tranformations / Getting information about transformations and jobs (repository-based), How it works...
  • Informix / Connecting to a database not supported by Kettle
  • Info steps / Looking up information with additional steps
  • INNER join / There's more...
  • insert operation / Insert, update, and delete all-in-one
  • INSERT statement / How it works..., How it works...
  • internet
    • data, searching over / Looking for values over an intranet or Internet, How to do it..., How it works...
  • intranet
    • data, searching over / Looking for values over an intranet or Internet, How to do it..., How it works...
  • invoice headers / Master/detail files

J

  • Janino library
    • about / Getting ready
  • jar file / Connecting to a database not supported by Kettle
  • Jaro-Winkler algorithm / There's more...
  • Jaro algorithm / There's more...
  • Java
    • URL, for tutorials / There's more...
  • JavaScript step
    • executions, controlling / Using the JavaScript step to control the execution of the entries in your job
  • JDBC
    • about / Creating a Pentaho report with data coming from PDI
  • JNDI data source / How it works...
  • job, executing
    • by setting static arguments / Executing a job or a transformation by setting static arguments and parameters, How to do it..., How it works...
    • by setting parameters / Executing a job or a transformation by setting static arguments and parameters, How to do it..., How it works...
    • by setting static arguments dynamically / Getting ready, How to do it..., How it works...
    • by setting parameters dynamically / Getting ready, How to do it..., How it works...
    • job name, determining at runtime / Executing a job or a transformation whose name is determined at runtime, How to do it..., How it works...
  • job part
    • executing, for every row in dataset / Executing part of a job once for every row in a dataset, Getting ready, How to do it..., How it works...
    • executing, until true condition / Executing part of a job several times until a condition is true, How to do it..., How it works...
  • jobs
    • launching / Launching jobs and transformations
    • executing, by setting static arguments / Executing a job or a transformation by setting static arguments and parameters, How to do it..., How it works...
    • executing, by setting parameters / Executing a job or a transformation by setting static arguments and parameters, How to do it..., How it works...
    • job part, executing for every row in dataset / Executing part of a job once for every row in a dataset, Getting ready, How to do it..., How it works...
    • copied row, accessing from / Accessing the copied rows from jobs, transformations, and other entries
    • loops, implementing in / Implementing loops in a job
    • log files, isolating for / Isolating log files for different jobs or transformations
  • job XML nodes / Job XML nodes
  • Join Rows (Cartesian product) step / Limiting the number of output rows
  • joins options
    • about / There's more...
    • INNER join / There's more...
    • LEFT OUTER join / There's more...
    • RIGHT OUTER join / There's more...
    • FULL OUTER join / There's more...
  • Json
    • about / Working with Json files
    • URL, for info / There's more...
  • Json files
    • example / Working with Json files
    • working with / How to do it..., How it works...
    • reading, dynamically / Reading Json files dynamically
    • writing / Writing Json files
  • Json input step / How it works...
  • junk dimension tables / There's more...

K

  • ${KTR_NAME} variable / There's more...
  • Kettle
    • about / Introduction, How it works..., Introduction, Introduction
    • unsupported database connection / Connecting to a database not supported by Kettle
    • files, copying / Getting ready, How it works...
    • files, moving / Getting ready, How it works..., Moving files
    • files, deleting / Getting ready, How it works...
    • files, placing on remote server / Putting files on a remote server, How to do it..., How it works...
    • folders, comparing with files / Comparing files and folders, How to do it...
    • files, comparing with folders / Comparing files and folders, How to do it...
    • data, searching in database table / Looking for values in a database table, How to do it..., How it works...
    • data, searching in database / Getting ready, How it works...
    • data, searching in resources / Looking for values in a variety of sources, How to do it..., How it works...
    • data, searching by proximity / Looking for values by proximity, How to do it..., How it works...
    • streams, splitting / Splitting a stream into two or more streams based on a condition, Getting ready, How to do it..., How it works...
    • fields, comparing against constant value / Comparing against the value of a Kettle variable
    • fields, comparing against another fields / Comparing against the value of a Kettle variable
    • rows, merging of two streams / Merging rows of two streams with the same or different structures, Getting ready, How to do it..., How it works...
    • streams, comparing / Comparing two streams and generating differences, How to do it..., How it works...
    • streams, joining on given conditions / Getting ready, How to do it..., How it works...
    • transformations, launching / Launching jobs and transformations
    • jobs, launching / Launching jobs and transformations
    • process flow, creating / Getting ready, How to do it..., How it works...
    • e-mails, sending with attached files / Sending e-mails with attached files, How to do it..., How it works...
    • custom log file, generating / Generating a custom log file, How to do it..., How it works...
    • custom functionality, programming / Getting ready, How to do it..., How it works...
    • sample data, generating for testing purpose / Generating sample data for testing purposes, How to do it..., How it works...
  • KettleComponent inputs
    • about / Supplying values for named parameters, variables and arguments
  • Kettle Franchising Factory (KFF) / Using Data grid step to generate specific data
  • keywords / How it works...
  • Kitchen documentation
    • URL / Launching jobs and transformations
  • kjb file / Avoiding creating the same database connection over and over again
  • KJube / Sending e-mails in a transformation, Using Data grid step to generate specific data
  • ktr file / Avoiding creating the same database connection over and over again, How it works...

L

  • last row
    • identifying, in stream / Identifying the last row in the stream
  • LEFT OUTER join / There's more...
  • Levenshtein algorithm / There's more...
  • libext/JDBC directory / Connecting to a database not supported by Kettle
  • lib folder / There's more...
  • location
    • specifying, of transformation / Specifying the location of the transformation
  • log files
    • about / Log files
    • filtering / Filtering the log file
    • clean log file, creating / Creating a clean log file
    • isolating, for different jobs / Isolating log files for different jobs or transformations
    • isolating, for transformations / Isolating log files for different jobs or transformations
  • logs
    • sending, through e-mail / Sending logs through an e-mail
    • customizing / Customizing logs
  • loops
    • implementing, in jobs / Implementing loops in a job
  • Loop XPath textbox / How it works...

M

  • Mail job entry
    • about / Sending e-mails in a transformation
  • Mail validator job entry / How it works...
  • master files / Master/detail files
  • Merge Join step / Joining two or more streams based on given conditions
  • Merge Rows (diff) step / How it works..., There's more...
  • Metaphone algorithm / There's more...
  • MJSV
    • about / Scripting alternatives to the UDJC step
  • modern column-oriented databases / Connecting to a database not supported by Kettle
  • Modified Java Script Value steps / Getting information about transformations and jobs (repository-based)
  • Mondrian cubes
    • about / There's more...
  • Mondrian distribution / Sample databases
  • MS SQL Server / Connecting to a database not supported by Kettle
  • multiple files
    • reading, at once / Reading several files at the same time, How to do it..., How it works...
    • generating, with different name / Generating several files simultaneously with the same structure, but different names
    • generating, with similar structure / Generating several files simultaneously with the same structure, but different names
    • copying / Getting ready, How it works...
    • moving / Getting ready, How it works...
    • deleting / Getting ready, How it works...
  • multiple nodes
    • retrieving / Getting more than one node when the nodes share their XPath notation
  • multiple sheets
    • Excel file, writing with / Writing an Excel file with several sheets, How to do it..., How it works...
  • Museums data structure / Museums, Cities
  • MySQL
    • about / Sample databases

N

  • name
    • providing, to files / Providing the name of a file (for reading or writing) dynamically , How to do it..., How it works...
  • named parameter / How to do it..., Getting variables in the middle of the stream, Sample transformation: Random list, Sample transformation: Sequence
    • values, supplying for / Supplying values for named parameters, variables and arguments
  • named parameters / How it works...
  • Native (JDBC) / How it works...
  • Needleman-Wunsch algorithm / There's more...
  • nested Filter Rows steps
    • avoiding / Avoiding the use of nested Filter Rows steps
  • new flag / How it works...
  • new rows
    • interspersing, between existent rows / Interspersing new rows between existent rows, How to do it..., How it works...
  • next_days step
    • preview / A sample transformation

O

  • ${OUTPUT_FOLDER} variable / Sample transformations, Getting ready, Getting ready
  • offices.txt file / How to do it...
  • OLAP
    • about / Creating a Pentaho report with data coming from PDI
  • OpenOffice calc files / How it works...
  • open source databases / Connecting to a database not supported by Kettle
  • Oracle / Connecting to a database not supported by Kettle
  • Oracle OCI connection / How it works...
  • ORDER BY clause / There's more...
  • ORDER_COLUMN parameter / How to do it...
  • Outdoor data structure / Products, Categories
  • outdoorProducts.txt file / Getting ready
  • output fields
    • format, providing to / Giving the output fields a format
  • output row number
    • limiting / Limiting the number of output rows
  • outputType parameter / How it works...

P

  • <Parameter> tag / How it works...
  • Pair letters Similarity algorithm / There's more...
  • param + <name of param.> parameter / How it works...
  • parameters
    • data, retrieving with / Getting data from a database by providing parameters, How it works..., Executing the SELECT statement several times, each for a different set of parameters
    • code, generalizing with / Generalizing you code
  • Parameters tab / How it works...
  • parent-child table
    • about / Loading a parent-child table
    • loading / How to do it..., How it works...
  • parent_job.getVariable() function / How it works...
  • parent_job.setVariable() function / How it works...
  • PDF
    • about / Creating a Pentaho report with data coming from PDI
  • PDI
    • database, altering with / Creating or altering a database table from PDI (design time), How to do it..., How it works...
    • database, creating from / Creating or altering a database table from PDI (design time), How to do it..., How it works..., Creating or altering a database table from PDI (runtime), How to do it..., There's more...
    • about / Inserting, deleting, or updating a table depending on a field, Introduction, Introduction, Introduction
    • simple XML files, reading / Reading simple XML files, How to do it..., How it works...
    • well-formed XML files, validating / Validating well-formed XML files, How to do it..., How it works...
    • URL, for wiki page / Introduction
    • files, generating with / Generating files from the PUC with PDI and the CDA plugin, How to do it..., How it works...
  • PDI data
    • Pentaho report, creating from / How to do it..., How it works...
  • PDI job
    • executing, from PUC / Getting ready, How to do it...
  • PDI transformations
    • executing, as part of Pentaho process / Executing a PDI transformation as part of a Pentaho process, How to do it..., How it works...
  • PDI transformations data
    • CDF dashboard, populating with / Getting ready, How to do it..., How it works...
  • Pentaho BI platform
    • about / Introduction
  • Pentaho BI platform databases
    • about / Pentaho BI platform databases
    • hibernate / Pentaho BI platform databases
    • quartz / Pentaho BI platform databases
    • sampledata / Pentaho BI platform databases
  • Pentaho BI Platform Demo / Pentaho BI platform databases
  • Pentaho BI Server
    • about / Configuring the Pentaho BI Server for running PDI jobs and transformations
    • configuring, for running PDI transformations / Getting ready, How it works...
    • configuring, for running PDI jobs / Getting ready, How it works...
  • Pentaho BI Suite Community Edition (CE)
    • about / Introduction
  • Pentaho Business Intelligence Suite
    • about / Introduction
  • Pentaho Data Integration Job process action
    • about / How it works...
  • Pentaho Data Integration process action
    • about / How it works...
  • Pentaho Design Studio
    • about / Getting ready, Getting ready
  • Pentaho developer
    • about / Pentaho BI platform databases
  • Pentaho report
    • creating, with data from PDI / How to do it..., How it works...
  • Pentaho Reporting Engine
    • about / Creating a Pentaho report with data coming from PDI
    • working / There's more...
  • Pentaho Server log / Log files
  • phonetic algorithms
    • about / There's more...
  • PID
    • fields, specifying with XPath notation / Specifying fields by using XPath notation, Getting ready, How it works...
    • XML file, validating against DTD definitions / Validating an XML file against DTD definitions, Getting ready, How it works...
    • XML file, validating against XSD schema / Validating an XML file against an XSD schema, How to do it..., How it works...
    • simple XML document, generating / Generating a simple XML document, How it works...
    • complex XML structures, generating / Generating complex XML structures, Getting ready, How to do it..., How it works...
  • PostgreSQL / Connecting to a database not supported by Kettle
  • predicate
    • about / Getting data selectively
  • previous_result.getNrLinesOutput() function / How it works...
  • previous_result element / Using the JavaScript step to control the execution of the entries in your job
  • primary key
    • generating / Inserting new rows where a simple primary key has to be generated, How to do it..., How it works..., Inserting new rows where the primary key has to be generated based on stored values, How to do it..., How it works...
  • process flow
    • about / Creating a process flow
    • creating / Getting ready, How to do it..., How it works...
  • processRow() function / How to do it..., How it works...
  • PUC
    • about / There's more..., Executing a PDI job from the Pentaho User Console, Generating files from the PUC with PDI and the CDA plugin
    • PDI job, executing from / Getting ready, How to do it...
    • files, generating with CDA plugin / Generating files from the PUC with PDI and the CDA plugin, How to do it..., How it works...
    • files, generating with PDI / Generating files from the PUC with PDI and the CDA plugin, How to do it..., How it works...
  • Put a file with FTP job entry / How it works...

Q

  • quartz database / Pentaho BI platform databases
  • query
    • data, retrieving with / Getting data from a database by running a query built at runtime, How it works...

R

  • random list transformation / Sample transformation: Random list
  • RDBMS
    • about / Introduction
  • records
    • inserting, in tables / Alternative solution if you just want to insert records
  • reference stream / How it works...
  • Refined SoundEx algorithm / There's more...
  • remote server
    • files, retrieving from / How to do it..., How it works...
    • files, placing on / Putting files on a remote server, How to do it..., How it works...
  • report
    • data source, defining for / How to do it...
  • reporting
    • about / Keeping things simple when it's time to deliver a plain file
  • repositories
    • about / Introduction
  • repository-based jobs
    • information, retrieving of / Getting information about transformations and jobs (repository-based), How it works...
  • repository-based transformations
    • information, retrieving of / Getting information about transformations and jobs (repository-based), How it works...
  • Reservoir Sampling step / Working with subsets of your data
  • resources
    • data, searching in / Looking for values in a variety of sources, How to do it..., How it works...
  • result filelist feature / How it works..., Executing a transformation or part of a job once for every file in a list of files
  • RIGHT OUTER join / There's more...
  • Row flattener step / How it works...
  • rows
    • updating, in table / Inserting or updating rows in a table, How to do it..., How it works..., Alternative solution if you just want to update rows, Alternative way for inserting and updating
    • inserting, in table / Inserting or updating rows in a table, How to do it..., How it works..., Alternative way for inserting and updating
    • inserting, during simple primary key generation / Inserting new rows where a simple primary key has to be generated, How to do it..., How it works..., Inserting new rows where the primary key has to be generated based on stored values, How to do it..., How it works...
    • merging, of two streams with different structure / Merging rows of two streams with the same or different structures, Getting ready, How to do it..., How it works...
    • merging, of two streams with similar structure / Merging rows of two streams with the same or different structures, Getting ready, How to do it..., How it works...
    • merging, of streams / Telling Kettle how to merge the rows of your streams
    • processing, based on row number / Processing rows differently based on the row number, How to do it..., How it works...
  • Ruby Scripting plugin / Scripting alternatives to the UDJC step
  • runtime
    • database connection, modifying at / Changing the database connection at runtime, How to do it..., How it works...
  • R_DATABASE table / Database connections tables
  • R_DATABASE_ATTRIBUTE table / Database connections tables
  • R_DATABASE_CONTYPE table / Database connections tables
  • R_DATABASE_TYPE table / Database connections tables
  • R_JOBENTRY table / Job tables
  • R_JOBENTRY_ATTRIBUTE table / Job tables
  • R_JOBENTRY_DATABASE table / Database connections tables
  • R_JOBENTRY_TYPE table / Job tables
  • R_JOB table / Job tables
  • R_JOB_HOP table / Job tables
  • R_JOB_NOTE table / Job tables
  • R_STEP table / Transformation tables
  • R_STEP_ATTRIBUTE table / Transformation tables
  • R_STEP_DATABASE table / Database connections tables
  • R_STEP_TYPE table / Transformation tables
  • R_TRANSFORMATION table / Transformation tables
  • R_TRANS_HOP table / Transformation tables
  • R_TRANS_NOTE table / Transformation tables

S

  • sample data
    • generating, for testing purpose / Generating sample data for testing purposes, How to do it..., How it works...
  • sample databases
    • about / Sample databases
  • sampledata database / Pentaho BI platform databases, Getting ready
  • sampleFiles directory / Getting ready
  • sample transformations
    • about / Sample transformations, A sample transformation
    • creating / Sample transformations
    • hello transformation / Sample transformation: Hello
    • random list transformation / Sample transformation: Random list
    • sequence transformation / Sample transformation: Sequence
    • file list transformation / Sample transformation: File list
    • preview / A sample transformation
  • Secure Sockets Layer (SSL) / Access via FTPS
  • SELECT * statement / There's more...
  • SELECT statement / There's more..., Getting data from a database by providing parameters, How it works...
    • executing, multiple times / Executing the SELECT statement several times, each for a different set of parameters
  • separator
    • alternative notation for / Alternative notation for a separator
  • sequence transformation / Sample transformation: Sequence
  • serialize/de-serialize mechanism
    • about / Serializing/De-serializing data
  • Set files in result step / Executing a transformation or part of a job once for every file in a list of files
  • settings.xml file / How it works...
  • SFTP
    • about / Access via SFTP
    • files, accessing with / Access via SFTP
  • shared.xml file / Avoiding creating the same database connection over and over again
  • SimMetrics / There's more...
  • simple file
    • reading / Reading a simple file, How to do it..., How it works...
    • writing / Getting ready, How it works...
  • simple XML document
    • generating / Generating a simple XML document, How it works...
  • simple XML files
    • reading / Reading simple XML files, How to do it..., How it works...
  • SMTP server / How it works...
  • SoundEx algorithm / There's more...
  • specific rows
    • identifying / Identifying specific rows
  • SPEED parameter / A sample transformation
  • Spoon / How to do it...
  • stand-alone application
    • about / Introduction
  • Steel Wheels structure / Getting ready, Steel Wheels structure
  • steps
    • executing, empty stream condition / Executing steps even when your stream is empty, How to do it..., How it works...
  • stream
    • splitting, based on condition / Splitting a stream into two or more streams based on a condition, Getting ready, How to do it..., How it works...
  • Stream Lookup step / How it works...
    • alternatives / Looking for alternatives when the Stream Lookup step doesn't meet your needs
  • streams
    • rows, merging of / Telling Kettle how to merge the rows of your streams
    • comparing / Comparing two streams and generating differences, How to do it..., How it works...
    • joining, on given conditions / Getting ready, How to do it..., How it works...
    • last row, identifying in / Identifying the last row in the stream
  • streams, merging
    • similar metadata condition / Making sure that the metadata of the streams is the same
  • subtransformation
    • transformation part, moving to / Moving part of a transformation to a subtransformation, How to do it..., How it works..., There's more...
    • about / Moving part of a transformation to a subtransformation
    • creating / How to do it...
  • Switch / Case step / Avoiding the use of nested Filter Rows steps
  • Synchronize after merge step / How it works..., Insert, update, and delete all-in-one

T

  • table
    • rows, updating in / Inserting or updating rows in a table, How to do it..., How it works...
    • rows, inserting in / Inserting or updating rows in a table, How to do it..., How it works...
    • records, inserting in / Alternative solution if you just want to insert records
    • data, deleting from / Deleting data from a table, Getting ready, How to do it..., How it works...
    • updating / Getting ready, How to do it..., How it works..., Insert, update, and delete all-in-one
    • deleting / Getting ready, How to do it..., How it works..., Insert, update, and delete all-in-one
    • inserting / Getting ready, How to do it..., How it works..., Insert, update, and delete all-in-one
  • Task Scheduler
    • about / There's more...
  • TEMP parameter / A sample transformation
  • Text file input step / Reading a simple file
  • tmp extension / Deleting a custom list of files
  • tokens
    • about / Getting data selectively
  • total_lines variable / How it works...
  • traditional row-oriented databases / Connecting to a database not supported by Kettle
  • transformation
    • about / Introduction
    • e-mails, sending in / Sending e-mails in a transformation
    • log files, isolating for / Isolating log files for different jobs or transformations
  • transformation part
    • moving, to subtransformation / Moving part of a transformation to a subtransformation, How to do it..., How it works..., There's more...
  • transformations
    • about / Introduction
    • launching / Launching jobs and transformations
    • executing, by setting static arguments / Executing a job or a transformation by setting static arguments and parameters, How to do it..., How it works...
    • executing, by setting parameters / Executing a job or a transformation by setting static arguments and parameters, How to do it..., How it works...
    • copied row, accessing from / Accessing the copied rows from jobs, transformations, and other entries
    • executing, for every row in dataset / Executing a transformation once for every row in a dataset
    • location, specifying of / Specifying the location of the transformation
    • values, supplying for arguments / Supplying values for named parameters, variables and arguments
    • values, supplying for variables / Supplying values for named parameters, variables and arguments
    • values, supplying for named parameters / Supplying values for named parameters, variables and arguments
  • transformations, executing
    • by setting parameters / Executing a job or a transformation by setting static arguments and parameters, How to do it..., How it works...
    • by setting static arguments / Executing a job or a transformation by setting static arguments and parameters, How to do it..., How it works...
    • by setting parameters dynamically / Getting ready, How to do it..., How it works...
    • by setting static arguments dynamically / Getting ready, How to do it..., How it works...
    • transformation name, determining at runtime / Executing a job or a transformation whose name is determined at runtime, How to do it..., How it works...
  • transformation XML nodes / Transformation XML nodes
  • Transport Layer Security (TLS) / Access via FTPS
  • txt extension / Getting ready, How it works...

U

  • UDJC
    • about / Programming custom functionality
  • UDJC step
    • benefits / Programming custom functionality
    • dialog, exploring for / How it works...
    • scripting alternatives / Scripting alternatives to the UDJC step
  • UDJE
    • about / How to do it..., Scripting alternatives to the UDJC step
  • UDJE step / How it works...
  • unstructured file
    • writing / Writing an unstructured file, How to do it..., How it works...
  • unstructured files
    • writing / Reading unstructured files, Getting ready, How to do it..., How it works...
  • update opeartion / Insert, update, and delete all-in-one
  • UPDATE statement / How it works...
  • User Defined Java Expression step / How to do it..., Overcoming the difficulties of complex conditions
  • UUID / There's more...

V

  • Value Mapper step
    • using / Using the Value Mapper step for looking up from a short list of values
  • values
    • supplying, for arguments / Supplying values for named parameters, variables and arguments
    • supplying, for variables / Supplying values for named parameters, variables and arguments
    • supplying, for named parameters / Supplying values for named parameters, variables and arguments
  • variables
    • using, in database connection definition / Avoiding modifying jobs and transformations every time a connection changes
    • retrieving, in middle of stream / Getting variables in the middle of the stream
    • values, supplying for / Supplying values for named parameters, variables and arguments

W

  • W3C
    • URL, for XML / Introduction
  • Webdetails
    • about / There's more..., There's more...
  • web service consumption
    • avoiding / A sample transformation
  • Web service lookup step / Looking for values consuming a web service
  • web services
    • about / Looking for values consuming a web service
  • well-formed XML files
    • validating / Validating well-formed XML files, How to do it..., How it works...
  • WHERE clause / How it works...
  • Write to log step / How it works...
  • WSDL
    • about / How it works...

X

  • xls extension / Getting ready, How it works...
  • XML
    • about / Introduction, Creating a Pentaho report with data coming from PDI
  • XML, as field / XML data in a field
  • XML file
    • validating, against DTD definitions / Validating an XML file against DTD definitions, Getting ready, How it works...
    • validating, against XSD schema / Validating an XML file against an XSD schema, How to do it..., How it works...
  • XML file name, as field / XML file name in a field
  • XML structures
    • fields, generating with / Generating fields with XML structures
  • XML transformation
    • HTML page, generating with / How to do it..., How it works...
  • XPath
    • about / How it works...
  • XPath notation
    • fields, specifying with / Specifying fields by using XPath notation, Getting ready, How it works...
    • working / There's more...
    • multiple nodes, retrieving / Getting more than one node when the nodes share their XPath notation
  • XSD schema
    • XML file, validating against / Validating an XML file against an XSD schema, How to do it..., How it works...
  • XSL transformation
    • HTML page, generating with / How to do it..., How it works...

Z

  • ZIP files
    • working with / How to do it..., How it works...
lock icon The rest of the chapter is locked
arrow left Previous Section
Register for a free Packt account to unlock a world of extra content!
A free Packt account unlocks extra newsletters, articles, discounted offers, and much more. Start advancing your knowledge today.
Unlock this book and the full library FREE for 7 days
Get unlimited access to 7000+ expert-authored eBooks and videos courses covering every tech area you can think of
Renews at $19.99/month. Cancel anytime