Managing materialized views
A materialized view is a database object that persists the results of a query to disk. In Amazon Redshift, materialized views allow frequently used complex queries to be stored as separate database objects, allowing you to access these database objects directly, and enabling faster query responses.
Employing materialized views is a common approach to powering repeatable queries in a business intelligence (BI) dashboard, and avoids expensive computation each time. Furthermore, materialized views allow an incremental refresh of the results, using the underlying table data. In this recipe, we will create a materialized view to query the tables and also to persist the results to fetch the data more quickly.
Getting ready
To complete this recipe, you will need access to any SQL interface such as a SQL client or a query editor.
How to do it…
Let's create a materialized view using the CREATE MATERIALIZED VIEW
command. We will use the following steps to create a materialized view, in order to store the precomputed results of an analytical query and also see how to refresh it:
- Create a
finance.customer_agg_mv
materialized view using the results of the query based onfinance.customer
:CREATE MATERIALIZED VIEW finance.customer_agg_mv AS SELECT EXTRACT(year FROM date_of_birth) AS year_of_birth, count(1) customer_cnt FROM finance.customer group by EXTRACT(year FROM date_of_birth);
- We can now select directly from
finance.customer
, just like with any another database object, like so:select * from finance.customer limit 5;
This is the expected output:
outputyear_of_birth,customer_cnt 1975 1 1979 1 1995 1 1970 1 1965 1
- You can verify the state of a materialized view by using a
STV_MV_INFO
system table (https://docs.aws.amazon.com/redshift/latest/dg/r_STV_MV_INFO.html):select * from STV_MV_INFO where name='customer_agg_mv';
This is the expected output:
outputdb_name,schema,name,updated_upto_xid,is_stale,owner_user_name,state,autorefresh, autorewrite vdwpoc finance customer_agg_mv 24642401 f vdwadmin 1 f t
Here,
stale='f'
indicates the data is current, reflecting thedaily_product_reviews
underlying base table. This column can be used to refresh the materialized view when needed. Another key column in theSTV_MV_INFO
table is thestate
column, which indicates if an incremental refresh is possible(state=1)
or not(state=0)
. In the materialized view we created astate=1
state, which indicates a faster incremental refresh is possible. - Now, let's load more data into the underlying table
finance.customer,
using the following command, and check theSTV_MV_INFO
table:insert into finance.customer values (11, 'mark', 'bar', '1980-02-01'), (12, 'pete', 'smith', '1990-2-01'), (13, 'woofy', 'spock', '1980-11-01'), (14, 'woofy jr', 'scotty', '1975-03-01'), (15, 'eleven', 'of nine', '1990-07-01');
- Query the
STV_MV_INFO
table again to check the status of the materialized view:select name,is_stale,state from STV_MV_INFO where name='customer_agg_mv';
This is the expected output:
name,is_stale,state customer_agg_mv t 1
Note that
stale = 't'
indicates that the underlying data for the materialized view has changed, but it is possible to refresh it incrementally. - Refresh the materialized view using the
REFRESH MATERIALIZED VIEW
command and check the status again:REFRESH MATERIALIZED VIEW finance.customer_agg_mv; select name,is_stale, state from STV_MV_INFO where name='customer_agg_mv';
This is the expected output:
name,is_stale,state customer_agg_mv f 1
As we can see from the preceding code snippet,
customer_agg_mv
is now updated to reflect the underlying table data.
How it works…
A materialized view can be updated with the latest data from the underlying tables by using the REFRESH MATERIALIZED VIEW
command. When the materialized view is being refreshed, it executes a separate transaction to update the dataset. Amazon Redshift also supports an autorefresh option to keep the materialized view up to date as soon as possible after base tables change.