Preface
We've all been there. Your boss drops you an e-mail saying:
Good news, we've just bought system X, which is going to make our lives a lot easier. First though, we need to hook it up to system Y for daily product and inventory feeds and system Z to post the financials back for invoicing. Should be easy, right? It's going to be live in two months. Any problems, please let me know. Oh....if you can get some extracts for the data warehouse at the same time, that would be great too.
What to do? Well, you could ask your senior developer to code some integration jobs from scratch, but they might be hard to maintain, particularly if he/she left the company. In addition, you know he/she is working flat out on another important project. Alternatively, you could ask your boss if you can invest in a proprietary integration suite, with a legion of highly paid consultants. That will certainly do the job, but the budget, and timeline might not stretch to this.
Or you can take the new junior developer who joined your company a couple of weeks ago, dust off your business analyst and testing skills, and get the job done on time, on budget with Talend Open Studio for Data Integration.
Getting Started with Talend Open Studio for Data Integration is an introductory guide to solving this problem and many others like it.
What this book covers
Chapter 1, Knowing Talend Open Studio, introduces the reader to Talend Open Studio for Data Integration and what it can be used for. It also covers the installation of Talend Open Studio for Data Integration.
Chapter 2, Working with Talend Open Studio, introduces some common concepts the reader will come across when using Talend Open Studio for Data Integration, including creating a workspace to contain integration jobs, a tour of the Talend Open Studio for Data Integration interface, and use of metadata and schemas. We'll also build a simple "hello world" job.
Chapter 3, Transforming Files, gets into the detail of Talend Open Studio for Data Integration integrations and looks at using Talend Open Studio for Data Integration to transform files from one format to another.
Chapter 4, Working with Databases, looks at databases—how to get data out and how to get data in.
Chapter 5, Filtering, Sorting, and Other Processing Techniques, introduces common data operations: filtering, sorting, and aggregating.
Chapter 6, Managing Files, shows how to manage files during integration jobs. We'll look at renaming, moving, copying, and deleting files; how to timestamp a file; connecting to remote servers to FTP files; and zipping and unzipping files.
Chapter 7, Job Orchestration, will look at more complex integrations and how "one-shot" tasks can be combined to form multi-step jobs. We'll create subjobs and link them together using "if/then" logic. Integrations often produce temporary files, so we'll look at ways to clean up afterwards.
Chapter 8, Managing Jobs, covers the process of packaging, deploying, and scheduling jobs in a live environment.
Chapter 9, Global Variables and Contexts, looks at contexts and we explore how the same job can be used in different environments. We introduce dynamic variables, allowing our integration jobs to run flexibly, based on the current runtime information, rather than introducing complex, hardcoded routines.
Chapter 10, Worked Examples, brings together all of the knowledge from previous chapters in a series of worked examples. A real-life integration project is explored and developed to illustrate the use of Talend Open Studio for Data Integration "in the wild".
Appendix A, Installing Sample Jobs and Data, details how to obtain and use the sample data files required to follow the job development examples in the book. All of the jobs created throughout the book are also provided for reference.
Appendix B, Resources, highlights some resources and further reading to expand your knowledge of Talend Open Studio for Data Integration.
What you need for this book
The hardware and software requirements for this book are:
A computer running Windows, Linux, or Mac OS with Java installed
Talend Open Studio for Data Integration
A text file/XML editor
A MySQL database instance
Who this book is for
This book is for developers, business analysts, project managers, business intelligence specialists, system architects, and consultants who need to undertake integration projects. The book assumes a certain level of technical aptitude and readers should be comfortable with some of the following concepts and technologies:
Relational database management systems with some SQL (structured query language) experience
XML
Java
File Transfer Protocol (FTP)
Programming flow and logic
Conventions
In this book, you will find a number of styles of text that distinguish between different kinds of information. Here are some examples of these styles, and an explanation of their meaning.
Code words in text are shown as follows: "Create a file delimited metadata for the currencies.csv
file."
A block of code is set as follows:
String datestamp=TalendDate.getDate("YYYYMMDD"); globalMap.put("dateStamp",datestamp);
Any command-line input or output is written as follows:
sh [file name].sh
New terms and important words are shown in bold. Words that you see on the screen, in menus or dialog boxes for example, appear in the text like this: "Go to the Debug Run tab and click on Traces Debug".
Note
Warnings or important notes appear in a box like this.
Tip
Tips and tricks appear like this.
Reader feedback
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <feedback@packtpub.com>
, and mention the book title through the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Customer support
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Errata
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/support, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website, or added to any list of existing errata, under the Errata section of that title.
Piracy
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <copyright@packtpub.com>
with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
Questions
You can contact us at <questions@packtpub.com>
if you are having a problem with any aspect of the book, and we will do our best to address it.