Designing a Custom Data Flow Component
This recipe demonstrates the design of a Custom Data flow transformation. Efficient resource use represents one of the principal objectives in ETL development. Being able to determine which data extracted from the source actually needs to be loaded into the destination is probably the most important capability of any ETL solution. Determining whether an incoming row contains data that is different from the corresponding existing row can be performed by comparing each source column with the corresponding destination column. Such comparisons can be costly as they require all relevant data to be loaded from the destination table to perform the comparison.
By creating a single hash value based on the values of all the columns in the incoming row, and comparing only this single value with the one stored in the destination table, resource use can be reduced significantly. Of course, the hashed values are restricted in size and none of the algorithms...