Chapter 12. Debatching Bulk Data
Debatching data is the process of turning one huge pile of data into many small piles of data.
Why is it better to shovel one ton of data using two thousand, one pound shovels instead of one big load from a huge power shovel? After all, large commercial databases and the attendant bulk loader or SQL Loader programs are designed to do just that: insert huge loads of data in a single shot.
The bulk load approach works under certain tightly constrained circumstances. They are as follows:
1. The "bulk" data comes to you already matching the table structure of the destination system. Of course, this may mean that it was debatched before it gets to your system.
2. The destination system can accept some, potentially significant, error rate when individual rows fail to load.
3. There are no updates or deletes, just inserts.
4. Your destination system can handle bulk loads. Certain systems (for example, some legacy medical systems or other proprietary systems) cannot handle...