Before you begin the exercises in the book, it is worth becoming familiar with some of the key concepts and best practices.
Keep code changes small and test often
When developing using Talend, as with any other development tool, it is recommended to code in short bursts and test (run) frequently.
By keeping each change small, it is much easier to find where and what has caused problems during compilation and execution.
Chapter 10, Debugging, Logging, and Testing, is dedicated to debugging and logging; however, observing the preceding method will save time having to perform debugging steps that can sometimes take a long time.
Document your code
Talend sub-jobs have the ability to add titles, and every component in Talend has the option to add documentation for the component. Where you use Java, you should use the Java comment structures to document the code. Remember to use all these methods as you go along to ensure that your code is well documented.
Contexts and globalMap
context
and globalMap
are global areas used to store data that can be used by all components within a Talend job.
context variables
are predefined prior to job execution in a context
group, whereas globalMap
variables are created on the fly at any point within a job.
Context variables
Context variables are used by Talend to store parameter information, and can be used:
- To pass information into a job from the command line and/or a parent job
- To manage values of parameters between environments
- To store values within a job or set of jobs
Chapter 6, Managing Context Variables, is dedicated to the use and management of context variables within Talend
globalMap
globalMap
is a very important construct within Talend, in that:
- Almost every component will write information to
globalMap
once it completes execution (for example NB_LINE
is the number of rows processed in a component). - Certain components, such as
tFlowToIterate
or tFileList,
will store data in globalMap
variables for use by downstream components. - Developers can read and write to
globalMap
to create global variables in an ad hoc fashion. The use of global variables can often be the best way to ensure code is simple and efficient.
Java
Talend is a Java code generator, so having a little Java knowledge can help when using Talend. There are many Java tutorials for beginners online, and a little time spent learning the basics will help speed up your understanding of Talend.
Other background knowledge
As a data integrator, you will be expected to understand many technologies and how to interface with them, and this book assumes a basic knowledge of many of the most frequent data sources and targets.
Chapter 7, Working with Databases, relates to using Talend with databases. We have chosen to use MySQL, because it is quick to install, simple to use, and readily available. Basic knowledge of SQL and MySQL will therefore be required to perform the exercises in this chapter.
Other chapters will also assume knowledge of csv files, MS Excel, XML, and web services.