Introduction
XML (Extensible Markup Language) is a markup language used to describe data in a format that both humans and machines can understand; the opposite of HTML which was designed only to display data in a web browser. It is a self-descriptive language because its tags are not predefined. XML documents are not only used to store data, but also to exchange data between systems.
XML is recommended by W3C (World Wide Web Consortium). You will find the details at the following URL: http://www.w3.org/XML/. PEDI (Pentaho Data Integration) has a rich set of steps and job entries for manipulating XML structures. The recipes in this chapter are meant to teach you how to read, write, and validate XML using those features.
Note
Most of the recipes are based on a database with books and authors. To learn more about the structure of that database, see the Appendix A, Data Structures, or the examples in Chapter 1, Working with Databases.
The recipes assume that you know the basics of XML, that is...