Optimizing MDX queries using the NonEmpty() function
The NonEmpty()
function is a very powerful MDX function. It is primarily used to improve query performance by reducing sets before the result is returned.
Both Customer and Date dimensions are relatively large in the Adventure Works DW 2012 database. Putting the cross product of these two dimensions on the query axis can take a long time. In this recipe, we'll show how the NonEmpty()
function can be used on the Customer and Date dimensions to improve the query performance.
Getting ready
Start a new query in SSMS and make sure that you're working on the Adventure Works DW 2012 database. Then write the following query and execute it:
SELECT { [Measures].[Internet Sales Amount] } ON 0, NON EMPTY Filter( { [Customer].[Customer].[Customer].MEMBERS } * { [Date].[Date].[Date].MEMBERS }, [Measures].[Internet Sales Amount] > 1000 ) ON 1 FROM [Adventure Works]
The query shows the sales per customer and dates of their purchases, and isolates only those combinations where the purchase was over 1000 USD.
On a typical server, it will take more than a minute before the query will return the results.
Now let's see how to improve the execution time by using the NonEmpty()
function.
How to do it…
Follow these steps to improve the query performance by adding the NonEmpty()
function:
Wrap
NonEmpty()
around the cross join of customers and dates so that it becomes the first argument of that function.Use the measure on columns as the second argument of that function.
This is what the MDX query should look like:
SELECT { [Measures].[Internet Sales Amount] } ON 0, NON EMPTY Filter( NonEmpty( { [Customer].[Customer].[Customer].MEMBERS } * { [Date].[Date].[Date].MEMBERS }, { [Measures].[Internet Sales Amount] } ), [Measures].[Internet Sales Amount] > 1000 ) ON 1 FROM [Adventure Works]
Execute that query and observe the results as well as the time required for execution. The query returned the same results, only much faster, right?
How it works…
Both the Customer and Date dimensions are medium-sized dimensions. The cross product of these two dimensions contains several million combinations. We know that typically, the cube space is sparse; therefore, many of these combinations are indeed empty. The Filter()
operation is not optimized to work in block mode, which means a lot of calculations will have to be performed by the engine to evaluate the set on rows, whether the combinations are empty or not.
Fortunately, the NonEmpty()
function exists. This function can be used to reduce any set, especially multidimensional sets that are the result of a cross join operation. It removes the empty combinations of the two sets before the engine starts to evaluate the sets on rows. A reduced set has fewer cells to be calculated, and therefore the query runs much faster.
There's more…
Regardless of the benefits that were shown in this recipe, NonEmpty()
should be used with caution. Here are some good practices regarding the NonEmpty()
function:
Use it with sets, such as named sets and axes.
Use it in the functions which are not optimized to work in block mode, such as with the
Filter()
function.Avoid using it in aggregate functions such as
Sum()
.Avoid using it in other MDX set functions that are optimized to work in block mode. The use of
NonEmpty()
inside optimized functions will prevent them from evaluating the set in block mode. This is because the set will not be compact once it passes theNonEmpty()
function. The function will break it into many small non-empty chunks, and each of these chunks will have to be evaluated separately. This will inevitably increase the duration of the query. In such cases, it is better to leave the original set intact, no matter its size. The engine will know how to run over it in optimized mode.
NonEmpty() versus NON EMPTY
Both the NonEmpty()
function and the NON EMPTY
keyword can reduce sets, but they do it in a different way.
The NON EMPTY
keyword removes empty rows, columns, or both, depending on the axis on which that keyword is used in the query. Therefore, the NON EMPTY
operator tries to push the evaluation of cells to an early stage whenever possible. This way the set on axis becomes already reduced and the final result is faster.
Take a look at the initial query in this recipe, remove the Filter()
function, run the query, and notice how quickly the results come, although the multidimensional set again counts millions of tuples. The trick is that the NON EMPTY
operator uses the set on the opposite axis, the columns, to reduce the set on rows. Therefore, it can be said that NON EMPTY
is highly dependent on members on axes and their values in columns and rows.
Contrary to the NON EMPTY
operator found only on axes, the NonEmpty()
function can be used anywhere in the query.
The NonEmpty()
function removes all the members from its first set, where the value of one or more measures in the second set is empty. If no measure is specified, the function is evaluated in the context of the current member.
In other words, the NonEmpty()
function is highly dependent on members in the second set, the slicer, or the current coordinate, in general.
Common mistakes and useful tips
If a second set in the NonEmpty()
function is not provided, the expression is evaluated in the context of the current measure in the moment of evaluation, and current members of attribute hierarchies, also in the time of evaluation. In other words, if you're defining a calculated measure and you forget to include a measure in the second set, the expression is evaluated for that same measure which leads to null, a default initial value of every measure. If you're simply evaluating the set on the axis, it will be evaluated in the context of the current measure, the default measure in the cube or the one provided in the slicer. Again, this is perhaps not something you expected. In order to prevent these problems, always include a measure in the second set.
NonEmpty()
reduces sets, just like a few other functions, namely Filter()
and Existing()
do. But what's special about NonEmpty()
is that it reduces sets extremely efficiently and quickly. Because of that, there are some rules about where to position NonEmpty()
in calculations made by the composition of MDX functions (one function wrapping the other). If we're trying to detect multi-select, that is, multiple members in the slicer, NonEmpty()
should go inside with the EXISTING
function/keyword outside. The reason is that although they both shrink sets efficiently, NonEmpty()
works great if the set is intact. EXISTING
is not affected by the order of members or compactness of the set. Therefore, NonEmpty()
should be applied earlier.
You may get System.OutOfMemory errors if you use the CrossJoin()
operation on many large hierarchies because the cross join generates a Cartesian product of those hierarchies. In that case, consider using NonEmpty()
to reduce the space to a smaller subcube. Also, don't forget to group the hierarchies by their dimension inside the cross join.