Using Exists and Keep to limit the data load
Quite often, we need to restrict the amount of data that we are loading. We can usually do this by comparing the value in a particular field to some other value (for example, Year > 2010
), but we might also need to load the data based on the values that have already been loaded into memory.
Getting ready
Load the following script:
Sales: LOAD * INLINE [ Year, Country, SalesPersonID, Sales 2011, Germany, 1, 1233 2012, Germany, 1, 2133 2013, Germany, 1, 3421 2011, UK, 2, 1567 2012, UK, 2, 2244 2013, UK, 2, 2567 2011, USA, 3, 1098 2012, USA, 3, 1123 2013, USA, 3, 1456 ]; SalesPerson: LOAD SPID As SalesPersonID, SalesPersonName as SalesPerson; Load * INLINE [ SPID, SalesPersonName 1, John Smith 2, Jayne Volta 3, Graham Brown 4, Anita Weisz ]; Budget: LOAD * INLINE [ Year, Country, Budget 2012, Germany, 2100 2013, Germany, 3100 2014, Germany, 4100 2012, UK, 2100 2013, UK, 2600 2014, UK, 3100 2012, USA...