Working with Materialized Views¶
A materialized view is a pre-computed data set derived from a query specification (the SELECT list in the view definition) and stored for later re-use. Because the data is pre-computed, querying a materialized view is faster than executing the original query. This performance difference can be significant when a query is run frequently or is sufficiently complex.
Materialized views are designed to improve query performance of workloads composed of common, repeated query patterns. However, materializing intermediate results incurs additional costs and, therefore, should be offset by the frequency of usage for the materialized views.
In this Topic:
- When to Use Materialized Views
- Creating and Using Materialized Views
- Access Control Privileges
- Maintenance Costs for Materialized Views
- Materialized Views and Clustering
- Tips for Materialized Views
When to Use Materialized Views¶
Materialized views are particularly useful when:
- Query results contain a small number of rows and/or columns relative to the base table (the table on which the view is defined).
- Query results contain results that require significant processing, including:
- Analysis of semi-structured data.
- Aggregates that take a long time to calculate.
Advantages of Materialized Views¶
The Snowflake implementation of materialized views provides a number of unique characteristics:
- Materialized views can improve the performance of queries that use the same subquery results repeatedly.
- Materialized views are automatically and transparently maintained by Snowflake. A background service updates the materialized view after changes are made to the base table. This is more efficient and less error-prone than manually maintaining the equivalent of a materialized view at the application level.
- Data accessed through materialized views is always current, regardless of the amount of DML that has been performed on the base table. If a query is run before the materialized view is up-to-date, Snowflake either updates the materialized view or uses the up-to-date portions of the materialized view and retrieves any required newer data from the base table.
The automatic maintenance of materialized views consumes credits. For more details, see Maintenance Costs for Materialized Views (in this topic).
Materialized Views vs. Regular Views¶
In general, when deciding whether to create a materialized view or a regular view, use the following criteria:
- Create a materialized view when all of the following are true:
- The query results from the view don’t change often. This almost always means that the underlying/base table for the view doesn’t change often, or at least that the subset of base table rows used in the materialized view don’t change often.
- The results of the view are used often (typically significantly more often than the query results change).
- The query consumes a lot of resources. Typically, this means that the query consumes a lot of processing time or credits, but it could also mean that the query consumes a lot of storage space for intermediate results.
- Create a regular view when any of the following are true:
- The results of the view change often.
- The results are not used often (relative to the rate at which the results change).
- The query is not resource intensive so it is not costly to re-run it.
Note that these criteria are just guidelines. A materialized view might provide benefits even if it is not used often — as long as the results change less frequently than the usage of the view.
Also, there are other factors to consider when deciding whether to use a regular view or a materialized view.
For example, the cost of storing the query results for the materialized view might be a factor; if the results are not used very often (even if they are used more often than they change), then the additional storage costs might not be worth the performance gain.
Comparison with Tables, Regular Views, and Cached Results¶
Materialized views are similar to tables in some ways and similar to regular (i.e. non-materialized) views in other ways. In addition, materialized views have some similarities with cached results, particularly because both enable storing query results for future re-use.
This section compares some of the similarities and differences between these objects in a number of different areas, including:
- Query performance.
- Query security.
- Reduced query logic complexity.
- Data clustering (related to query performance).
- Storage and maintenance costs.
Snowflake caches query results for a short period of time after a query has been run. In some situations, if the same query is re-run, then Snowflake can simply return the same results without re-running the query. This is the fastest and most efficient form of re-use, but also the least flexible. For more details, see Using Persisted Query Results.
Both materialized views and cached query results provide query performance benefits:
- Materialized views are more flexible, but typically slower, than cached results.
- Materialized views are faster than tables because of their “cache” (i.e. the query results for the view); in addition, if data has changed, they can use their “cache” for data that hasn’t changed and the base table for any data that has changed.
Regular views do not cache any data, and therefore generally do not offer any query performance benefits. However, both materialized views and regular views enable enhanced data security by allowing data to be exposed or hidden at the row level or column level.
The following table shows the key similarities and differences between tables, regular views, cached query results, and materialized views:
|Performance Benefits||Security Benefits||Simplifies Query Logic||Supports Clustering||Uses Storage||Uses Credits for Maintenance||Notes|
|Cached query result||✔||Used only if data has not changed and if query only uses deterministic functions (e.g. not CURRENT_DATE).|
|Materialized view||✔||✔||✔||✔||✔||✔||Storage and maintenance requirements typically result in increased costs.|
Creating and Using Materialized Views¶
This section provides information about creating and using materialized views.
Materialized View DDL and DML¶
Materialized views are first-class database objects. Snowflake provides the following DDL commands for creating and maintaining materialized views:
- CREATE MATERIALIZED VIEW
- ALTER MATERIALIZED VIEW
- DROP MATERIALIZED VIEW
- DESCRIBE MATERIALIZED VIEW
- SHOW MATERIALIZED VIEWS
Note that Snowflake does not allow standard DML on materialized views, but provides the following special DML command for deleting all the data in a materialized view so that it can be refreshed:
Also, you can use the standard commands for granting and revoking privileges on materialized views:
For more details about privileges on materialized views, see Access Control Privileges (in this topic).
Suppose that, every day, you run a query
Q that includes a subquery
S is resource-intensive and queries data that
changes only once a week, then you could improve performance of the outer query
Q by running
S and caching the results in a table:
- You would update the table only once a week.
- The rest of the time, when you run
Q, it would reference the subquery results of
Sthat were stored in the table.
This would work well as long as the results of subquery
S (and thus the contents of table
T) change predictably (e.g. at the
same time every week).
However, if the results of
S change unpredictably, but still rarely, then caching the results in a table is risky; sometimes your
Q will return out-of-date results if table
T is out-of-date.
Ideally, you’d like a special type of cache for results that change rarely, but for which the timing of the change is unpredictable. Looking
at it another way, you’d like to force your subquery
S to be re-run (and your table
T to be updated) when necessary.
A materialized view implements an approximation of the best of both worlds. You define a query for your materialized view, and the results of the query are cached (as though they were stored in an internal table), but Snowflake updates the cache when the table that the materialized view is defined on is updated. Thus, your subquery results are readily available for fast performance.
As a less abstract example, suppose that you run a small branch of a large pharmacy, and your branch stocks hundreds of medications out of a total of tens of thousands of FDA-approved medications.
Suppose also that you have a complete list of all medications that each of your customers takes, and that almost all of those customers order only medicines that are in stock (i.e. special orders are rare).
In this scenario, you could create a materialized view that lists only the interactions among medicines that you keep in stock. When a customer orders a medicine that she has never used before, if both that medicine and all of the other medicines that she takes are covered by your materialized view, then you don’t need to check the entire FDA database for drug interactions; you can just check the materialized view, so the search is faster.
Also, note that you can use this materialized view by itself, or you can use it in a join.
Continuing with the pharmacy example, suppose that you have one table that lists all of the medicines that each of your customers takes; you can join that table to the materialized view of drug interactions to find out which of the customer’s current medications might interact with the new medication. You might use an outer join to make sure that you list all of the customer’s medicines, whether or not they are in your materialized view; if the outer join shows that any of the current medicines are not in the materialized view, you can re-run the query on the full drug interactions table.
General Usage Notes¶
Whenever possible, use the qualified name for the base table referenced in a materialized view. This insulates the view from changes that can invalidate the view, such as moving the base table to a different schema from the view (or vice versa).
If the named of the base table isn’t qualified and the table or view is moved to a different schema, the reference becomes invalid.
When a materialized view is first created, Snowflake performs the equivalent of a CTAS operation. This means that the CREATE MATERIALIZED VIEW statement might take awhile to complete.
Maintenance of materialized views is performed by a background process and the details are not predictable by the user. If maintenance falls behind, queries may run more slowly than they would if the views are up-to-date. However, the results will always be correct; if parts of the view are out of date, Snowflake skips those portions and will look up data in the base table if necessary.
If you suspend maintenance of a view, you should not query the view until you resume maintenance.
SHOW VIEWS returns both materialized and regular views.
INFORMATION_SCHEMA.VIEWS displays materialized views. Note that the CHECK_OPTION column always displays
NONEand the IS_UPDATABLE column always displays
NObecause materialized views are not updatable.
INFORMATION_SCHEMA.TABLES does not show materialized views. This is to avoid the object being returned twice in queries on the view.
Limitations on Creating Materialized Views¶
These are current limitations; some of them may be removed or changed in future versions.
The following limitations apply to creating materialized views:
- A materialized view can query only a single table or single materialized view. (Note that a self-join is also not allowed.) However, you can create more complex views that take advantage of materialized views by creating a non-materialized view that references multiple materialized views.
- A materialized view cannot query:
- A non-materialized view.
- A UDTF (user-defined table function).
- A materialized view cannot include:
- Window functions.
- HAVING clauses.
- LIMIT clauses.
- GROUP BY keys that are not within the SELECT list. All GROUP BY keys in a materialized view must be part of the SELECT list.
- Nesting of subqueries within a materialized view.
- There are limitations on aggregate functions in a materialized view definition:
- Functions used in a materialized view must be deterministic. For example, using CURRENT_TIME or CURRENT_TIMESTAMP is not permitted.
- Secure materialized views are not supported; a view can be secure or materialized, but not both.
- The query cannot contain set operators (e.g. INTERSECT).
Limitations on Using Materialized Views¶
These are current limitations; some of them may be removed or changed in future versions.
The following limitations apply to using materialized views:
To ensure that materialized views stay consistent with the base table on which they are defined, you cannot perform most DML operations on a materialized view itself. For example, you cannot insert rows directly into a materialized view (although of course you can insert rows into the base table). The prohibited DML operations include:
Truncating a materialized view is supported, but is not generally recommended. For more details, see TRUNCATE MATERIALIZED VIEW.
Cloning of materialized views is not supported.
Time Travel is not currently supported on materialized views.
Creating a Materialized View¶
This section contains a basic example for the purpose of getting started using materialized views:
CREATE OR REPLACE MATERIALIZED VIEW mv1 AS SELECT MyResourceIntensiveFunction(binary_col) FROM table1; SELECT * FROM mv1;
More detailed examples are provided in Examples (in this topic).
Access Control Privileges¶
There are two types of privileges that are related to materialized views:
- Privileges directly on the materialized view itself.
- Privileges on the database objects (e.g. tables) that the materialized view accesses.
Privileges on a Materialized View¶
Similar to other database objects (tables, views, UDFs, etc.), materialized views are owned by a role and have privileges that can be granted to other roles.
Privileges on a materialized view are similar to privileges on a table. However, since most DML operations do not apply to materialized views, DML-related privileges do not apply.
Privileges on the Database Objects Accessed by the Materialized View¶
As with non-materialized views, a user who wishes to access a materialized view needs privileges only on the view, not on the underlying object(s) that the view references.
Maintenance Costs for Materialized Views¶
Materialized views typically impact your costs for both storage and compute resources:
Storage: Each materialized view stores query results, which adds to the monthly storage usage for your account.
Compute resources: In order to prevent materialized views from becoming out-of-date, Snowflake performs automatic background maintenance of materialized views. When a base table changes, all materialized views defined on the table are updated by a background service that uses compute resources provided by Snowflake.
These updates can consume significant resources, resulting in increased credit usage. However, Snowflake ensures efficient credit usage by only billing your account for the actual resources used. Billing is calculated in 1-second increments.
Estimating and Controlling Costs¶
There are no tools to estimate the costs of maintaining materialized views. In general, the costs are proportional to:
- The amount of data that changes in each base table.
- The number of materialized views created on each base table, and the amount of data that changes in each of the views.
You can control the cost of maintaining materialized views by carefully choosing how many views to create, which tables to create them on, and so on.
You can also control costs by suspending or resuming the materialized view; however, suspending maintenance typically only defers costs, rather than reducing them.
If you are concerned about the costs associated with maintaining materialized views, we recommend starting slowly with this feature (i.e. create only a few materialized views on selected tables) and monitor the costs over time.
You can view the billing costs for maintaining materialized views using either the web interface or SQL:
The credit costs are tracked in a Snowflake-managed virtual warehouse named MATERIALIZED_VIEW_MAINTENANCE.
Query the MATERIALIZED_VIEW_REFRESH_HISTORY table function in the Information Schema.
You must use the ACCOUNTADMIN role to access this function.
The call to the function should be wrapped in
SELECT * FROM TABLE(INFORMATION_SCHEMA.MATERIALIZED_VIEW_REFRESH_HISTORY());
Materialized Views and Clustering¶
Defining clustering keys on a materialized view is supported and can increase performance in many situations.
If you cluster both the materialized view(s) and the base table on which the materialized view(s) are defined, you can cluster the materialized view(s) on different columns from the columns used to cluster the base table.
However, in most cases, clustering a subset of the materialized views on a table tends to be more cost-effective than clustering the table itself. If the data in the base table is accessed (almost) exclusively through the materialized views, and (almost) never directly through the base table, then clustering the base table adds costs without adding benefit.
If you are considering clustering both the base table and the materialized views, we recommend that you start by clustering only the materialized views, and that you monitor performance and cost before and after adding clustering to the base table.
Also, if you cluster the base table, then, if possible, cluster it on a column that tends to be in the same order as rows are added to the table (e.g. a timestamp or sequence number that increases over time).
If you plan to create a table, load it, and create a clustered materialized view(s) on the table, then Snowflake recommends that you create the materialized views last (after loading as much data as possible). This can save money on the initial data load, since it avoids some extra effort to maintain the clustering of the materialized view the first time that the materialized view is loaded.
If a materialized view is clustered, defining the view with an ORDER BY clause
SELECT statement is usually not required or recommended (e.g.
SELECT ... FROM base_table ... ORDER BY mv_clustering_key).
Snowflake will maintain the ordered data based on the
Tips for Materialized Views¶
Some tables store data for the most recent time period (e.g. the most recent day or week or month). When you “trim” your base table by deleting old data, the changes to the base table are propagated to the materialized view. Depending upon how the data is distributed across the micro-partitions, this could cause you to pay more for background updates of the materialized views. In some cases, you might be able to reduce costs by doing DELETE operations less frequently (e.g. daily rather than hourly, or hourly rather than every 10 minutes).
If you do not need to keep a specific amount of old data, you should experiment to find the best balance between cost and functionality.
This section contains additional examples of creating and using materialized views. For a simple, introductory example, see Creating a Materialized View (in this topic).
Simple Materialized View¶
This first example illustrates a simple materialized view and a simple query on the view.
Create the table and load the data, and create the view:CREATE TABLE inventory (product_ID INTEGER, wholesale_price FLOAT, description VARCHAR); CREATE OR REPLACE MATERIALIZED VIEW mv1 AS SELECT product_ID, wholesale_price FROM inventory; INSERT INTO inventory (product_ID, wholesale_price, description) VALUES (1, 1.00, 'cog');
Select data from the view:SELECT product_ID, wholesale_price FROM mv1; +------------+-----------------+ | PRODUCT_ID | WHOLESALE_PRICE | |------------+-----------------| | 1 | 1 | +------------+-----------------+
Joining a Materialized View¶
You can join a materialized view with a table or another view. This example builds on the previous example by creating an additional table, and then a non-materialized view that shows profits by joining the materialized view to a table:
CREATE TABLE sales (product_ID INTEGER, quantity INTEGER, price FLOAT); INSERT INTO sales (product_ID, quantity, price) VALUES (1, 1, 1.99); CREATE or replace VIEW profits AS SELECT m.product_ID, SUM(IFNULL(s.quantity, 0)) AS quantity, SUM(IFNULL(quantity * (s.price - m.wholesale_price), 0)) AS profit FROM mv1 AS m LEFT OUTER JOIN sales AS s ON s.product_ID = m.product_ID GROUP BY m.product_ID;
Select data from the view:SELECT * FROM profits; +------------+----------+--------+ | PRODUCT_ID | QUANTITY | PROFIT | |------------+----------+--------| | 1 | 1 | 0.99 | +------------+----------+--------+
Suspending Updates to a Materialized View¶
The following example temporarily suspends the use (and maintenance)
mv1 materialized view, and shows that queries on that view
will generate an error message while the materialized view is suspended:
ALTER MATERIALIZED VIEW mv1 SUSPEND; INSERT INTO inventory (product_ID, wholesale_price, description) VALUES (2, 2.00, 'sprocket'); INSERT INTO sales (product_ID, quantity, price) VALUES (2, 10, 2.99), (2, 1, 2.99);
Select data from the materialized view:SELECT * FROM profits ORDER BY product_ID;
Output:002037 (42601): SQL compilation error: Failure during expansion of view 'PROFITS': SQL compilation error: Failure during expansion of view 'MV1': SQL compilation error: Materialized View MV1 is invalid.
Resume:ALTER MATERIALIZED VIEW mv1 RESUME;
Select data from the materialized view:SELECT * FROM profits ORDER BY product_ID; +------------+----------+--------+ | PRODUCT_ID | QUANTITY | PROFIT | |------------+----------+--------| | 1 | 1 | 0.99 | | 2 | 11 | 10.89 | +------------+----------+--------+
Clustering a Materialized View¶
This example creates a clustered materialized view and uses it.
This creates two tables that track information about segments of a pipeline (e.g. for natural gas).CREATE TABLE pipeline_segments ( segment_ID BIGINT, material VARCHAR, -- e.g. copper, cast iron, PVC. installation_year DATE, -- older pipes are more likely to be corroded. rated_pressure FLOAT -- maximum recommended pressure at installation time. ); INSERT INTO pipeline_segments (segment_ID, material, installation_year, rated_pressure) VALUES (1, 'PVC', '1994-01-01'::DATE, 60), (2, 'cast iron', '1950-01-01'::DATE, 120) ; CREATE TABLE pipeline_pressures ( segment_ID BIGINT, pressure_psi FLOAT, -- pressure in Pounds per Square Inch measurement_timestamp TIMESTAMP ); INSERT INTO pipeline_pressures (segment_ID, pressure_psi, measurement_timestamp) VALUES (2, 10, '2018-09-01 00:01:00'), (2, 95, '2018-09-01 00:02:00') ;
The pipeline segments don’t change very frequently (new segments are not added frequently), so the pipeline_segments table is a good choice for a materialized view.CREATE MATERIALIZED VIEW vulnerable_pipes (segment_ID, installation_year, rated_pressure) CLUSTER BY (segment_ID) AS SELECT segment_ID, installation_year, rated_pressure FROM pipeline_segments WHERE material = 'cast iron' AND installation_year < '1980'::DATE;
If the materialized view had not been created with explicit clustering, then here is how to add clustering later.ALTER TABLE vulnerable_pipes CLUSTER BY (installation_year);
(New pressure measurements arrive frequently (perhaps every 10 seconds), so maintaining a materialized view on the pressure measurements would be expensive. Therefore, even though high performance (fast retrieval) of recent pressure data is important, the pipeline_pressures table starts without a materialized view.)
Create a (non-materialized) view that combines information from the materialized view and from the pipeline_pressures table:CREATE VIEW high_risk AS SELECT seg.segment_ID, installation_year, measurement_timestamp::DATE AS measurement_date, DATEDIFF('YEAR', installation_year::DATE, measurement_timestamp::DATE) AS age, rated_pressure - age AS safe_pressure, pressure_psi AS actual_pressure FROM vulnerable_pipes AS seg INNER JOIN pipeline_pressures AS psi ON psi.segment_ID = seg.segment_ID WHERE pressure_psi > safe_pressure ;
Now list the high-risk pipe segments. This should show pipeline segment_id == 2, which is old and made of a material that corrodes. This pipe segment has never experienced pressure higher than the maximum pressure rating at the time it was installed, but because of the potential for corrosion, its “safe limit” has declined over time, and the highest pressure it has experienced is higher than the pressure that was recommended for a pipe as old as the pipe was at the time of the pressure measurement.SELECT * FROM high_risk;
Output:+------------+-------------------+------------------+-----+---------------+-----------------+ | SEGMENT_ID | INSTALLATION_YEAR | MEASUREMENT_DATE | AGE | SAFE_PRESSURE | ACTUAL_PRESSURE | |------------+-------------------+------------------+-----+---------------+-----------------| | 2 | 1950-01-01 | 2018-09-01 | 68 | 52 | 95 | +------------+-------------------+------------------+-----+---------------+-----------------+
Failure during expansion of view '<name>': SQL compilation error: Materialized View <name> is invalid.¶
In many cases, this is caused by a change to the underlying table that the materialized view is based on. For example, if the table is dropped, or if the materialized view refers to a table column, but the column has been dropped, this error is returned.
If the table has been dropped and is not going to be re-created, then you probably should drop the view.
If the table has been modified, but still exists, you might be able to drop and re-create the materialized view, using the columns that remain.