materialized view
Materialized View Source Table - Base Data Source
Creating the source table, since our goal involves reporting aggregated data rather than a single record, allows us to parse it, pass the information to the materialized view, and discard the actual incoming data. This meets our goals and saves storage space, so we'll use theNull
Table Engine.
CREATE DATABASE IF NOT EXISTS analytics;
CREATE TABLE analytics.hourly_data
(
`domain_name` String,
`event_time` DateTime,
`count_views` UInt64
)
ENGINE = Null;
Note: Materialized views can be created on Null tables. Therefore, data written to the table will eventually affect the view, but the original raw data will still be discarded
Monthly summary tables and materialized views
For the first materialized view, you need to create theTarget
table (in this case theanalytics.monthly_aggregated_data
), the example will store the sum of the views by month and domain name.
CREATE TABLE analytics.monthly_aggregated_data
(
`domain_name` String,
`month` Date,
`sumCountViews` AggregateFunction(sum, UInt64)
)
ENGINE = AggregatingMergeTree
ORDER BY (domain_name, month);
will forwardTarget
The materialized view of the data on the table is as follows:
CREATE MATERIALIZED VIEW analytics.monthly_aggregated_data_mv
TO analytics.monthly_aggregated_data
AS
SELECT
toDate(toStartOfMonth(event_time)) AS month,
domain_name,
sumState(count_views) AS sumCountViews
FROM analytics.hourly_data
GROUP BY domain_name, month;
Annual summary tables and materialized views
Now, create a second materialized view that will link to the previous target tablemonthly_aggregated_data
。
First, create a new target table that will store the sum of views aggregated per domain per year.
CREATE TABLE analytics.year_aggregated_data
(
`domain_name` String,
`year` UInt16,
`sumCountViews` UInt64
)
ENGINE = SummingMergeTree()
ORDER BY (domain_name, year);
The materialized view is then created and this step defines the cascade.FROM
statement will use themonthly_aggregated_data
table, which means the data flow will be:
1. Data arrivalhourly_data
Table.
Will forward the received data to the first materialized viewmonthly_aggregated_data
a meter (measuring sth)
3.Finally, the data received in step 2 will be forwarded to theyear_aggregated_data
。
CREATE MATERIALIZED VIEW analytics.year_aggregated_data_mv
TO analytics.year_aggregated_data
AS
SELECT
toYear(toStartOfYear(month)) AS year,
domain_name,
sumMerge(sumCountViews) as sumCountViews
FROM analytics.monthly_aggregated_data
GROUP BY domain_name, year;
Attention:
A common misconception when working with materialized views is that the data is being read from a table, which is not theMaterialized views
work; the data forwarded is the inserted data block, not the final result in the table.
Imagine, in this example, that themonthly_aggregated_data
The engine used in is a collapsed merge tree (CollapsingMergeTree
), forwarded to the second materialized viewyear_aggregated_data_mv
of the data will not be the end result of folding the table, it will forward the data that has the just asSELECT… GROUP BY
block of data for the fields defined in the
If you are using theCollapsingMergeTree
、ReplacingMergeTree
so much so thatSummingMergeTree
and plan to create cascading materialized views, you need to understand the limitations described here.
data collection
Now it's time to test our cascading materialized views by plugging in some data: the
INSERT INTO analytics.hourly_data (domain_name, event_time, count_views)
VALUES ('', '2019-01-01 10:00:00', 1),
('', '2019-02-02 00:00:00', 2),
('', '2019-02-01 00:00:00', 3),
('', '2020-01-01 00:00:00', 6);
consult (a document etc)analytics.hourly_data
will not be able to find any records, because the table engine isNull
but the data have been processed
SELECT * FROM analytics.hourly_data
Output:
domain_name|event_time|count_views|
-----------+----------+-----------+
in the end
If you try to query the target table'ssumCountViews
field value, will see the field value represented in binary (in some terminals) because the value is not stored as a number, but as aAggregateFunction
type stored. To get the final result of the aggregation, you should use the-Merge
Suffix.
By making the following inquiries.sumCountViews
Field values are not displayed properly:
SELECT sumCountViews FROM analytics.monthly_aggregated_data
Output:
sumCountViews|
-------------+
|
|
|
utilizationMerge
Suffix acquisitionsumCountViews
Value.
SELECT sumMerge(sumCountViews) as sumCountViews
FROM analytics.monthly_aggregated_data;
Output:
sumCountViews|
-------------+
12|
existAggregatingMergeTree
air marshalAggregateFunction
define assum
The following is an example of how to use thesumMerge
. When in theAggregateFunction
Use the functionavg
If you are using theavgMerge
And so on.
SELECT month, domain_name, sumMerge(sumCountViews) as sumCountViews
FROM analytics.monthly_aggregated_data
GROUP BY domain_name, month
Output:
month |domain_name |sumCountViews|
----------+--------------+-------------+
2020-01-01|| 6|
2019-01-01|| 1|
2019-02-01|| 5|
Now we can see if the materialized view meets our defined goals.
The data is now stored in the target tablemonthly_aggregated_data
in which data for each domain can be aggregated on a monthly basis:
SELECT month, domain_name, sumMerge(sumCountViews) as sumCountViews
FROM analytics.monthly_aggregated_data
GROUP BY domain_name, month;
Output:
month |domain_name |sumCountViews|
----------+--------------+-------------+
2020-01-01|| 6|
2019-01-01|| 1|
2019-02-01|| 5|
Aggregate data for each domain on a yearly basis.
SELECT year, domain_name, sum(sumCountViews)
FROM analytics.year_aggregated_data
GROUP BY domain_name, year;
Output:
year|domain_name |sum(sumCountViews)|
----+--------------+------------------+
2019|| 6|
2020|| 6|
Combine multiple source tables to create a single target table
Materialized views can also be used to combine multiple source tables into a single target table. This is useful for creating tables similar to theUNION ALL
A materialized view of the logic is very useful.
First, create two source tables representing different sets of metrics: the
CREATE TABLE
(
`event_time` DateTime,
`domain_name` String
) ENGINE = MergeTree ORDER BY (domain_name, event_time);
CREATE TABLE
(
`event_time` DateTime,
`domain_name` String
) ENGINE = MergeTree ORDER BY (domain_name, event_time);
Then use the combined set of indicators to createTarget
Table:
CREATE TABLE analytics.daily_overview
(
`on_date` Date,
`domain_name` String,
`impressions` SimpleAggregateFunction(sum, UInt64),
`clicks` SimpleAggregateFunction(sum, UInt64)
) ENGINE = AggregatingMergeTree ORDER BY (on_date, domain_name);
Creates two pointers to the sameTarget
A materialized view of the table. There is no need to explicitly include missing columns:
CREATE MATERIALIZED VIEW analytics.daily_impressions_mv
TO analytics.daily_overview
AS
SELECT
toDate(event_time) AS on_date,
domain_name,
count() AS impressions,
0 clicks --<<<--- If the column is removed,then it defaults to clicksbecause of0
FROM
GROUP BY toDate(event_time) AS on_date, domain_name;
CREATE MATERIALIZED VIEW analytics.daily_clicks_mv
TO analytics.daily_overview
AS
SELECT
toDate(event_time) AS on_date,
domain_name,
count() AS clicks,
0 impressions --<<<---If the column is removed,then it defaults to impressions because of0
FROM
GROUP BY toDate(event_time) AS on_date, domain_name;
Now, when values are inserted, they will be aggregated into theTarget
in the corresponding columns of the table:
INSERT INTO (domain_name, event_time)
VALUES ('', '2019-01-01 00:00:00'),
('', '2019-01-01 12:00:00'),
('', '2019-02-01 00:00:00'),
('', '2019-03-01 00:00:00')
;
INSERT INTO (domain_name, event_time)
VALUES ('', '2019-01-01 00:00:00'),
('', '2019-01-01 12:00:00'),
('', '2019-03-01 00:00:00')
;
Query target table theTarget
table:
SELECT
on_date,
domain_name,
sum(impressions) AS impressions,
sum(clicks) AS clicks
FROM
analytics.daily_overview
GROUP BY
on_date,
domain_name
;
Output:
on_date |domain_name |impressions|clicks|
----------+--------------+-----------+------+
2019-01-01|| 2| 2|
2019-03-01|| 1| 1|
2019-02-01|| 1| 0|
Reference Links
/docs/en/guides/developer/cascading-materialized-views
AggregateFunction
Aggregate functions have an implementation-defined intermediate state that can be serialized asAggregateFunction(...)
datatype, and is usually passed through thematerialized viewstored in a table. A common way to generate the state of an aggregate function is to use theState
suffix to call the aggregation function. In order to get the final result of the aggregation later, you must use the function with the-Merge
The same aggregation function with a suffix.
AggregateFunction(name, types_of_arguments...)
- Parameter data type.
Parameter Description:
- The name of the aggregation function. If the name corresponds to an aggregation function shoe with parameters, you also need to specify parameters for the others.
- Polymerization function parameter type.
typical example
CREATE TABLE testdb.aggregated_test_tb
(
`__name__` String,
`count` AggregateFunction(count),
`avg_val` AggregateFunction(avg, Float64),
`max_val` AggregateFunction(max, Float64),
`time_max` AggregateFunction(argMax, DateTime, Float64),
`mid_val` AggregateFunction(quantiles(0.5, 0.9), Float64)
) ENGINE = AggregatingMergeTree()
ORDER BY (__name__);
Note: If the above SQL is not addedORDER BY (__name__, create_time)
, execution will report an error similar to the following:
SQL incorrect [42]: ClickHouse exception, code: 42, host: 192.168.88.131, port: 8123; Code: 42, () = DB::Exception: Storage AggregatingMergeTree requires 3 to 4 parameters:
name of column with date,
[sampling element of primary key],
primary key expression,
index granularity
Create a data source table and insert test data
CREATE TABLE testdb.test_tb
(
`__name__` String,
`create_time` DateTime,
`val` Float64
) ENGINE = MergeTree()
PARTITION BY toStartOfWeek(create_time)
ORDER BY (__name__, create_time);
INSERT INTO testdb.test_tb(`__name__`, `create_time`, `val`) VALUES
('xiaoxiao', now(), 80.5),
('xiaolin', addSeconds(now(), 10), 89.5),
('xiaohong', addSeconds(now(), 20), 90.5),
('lisi', addSeconds(now(), 30), 79.5),
('zhangshang', addSeconds(now(), 40), 60),
('wangwu', addSeconds(now(), 50), 65);
insert data
usingState
suffix of the aggregation function of theINSERT SELECT
to insert data - for example, if you want to get the mean value of the target column data, i.e. theavg(target_column)
, then the aggregation function used to insert the data isavgState
,*State
The aggregate function returns the state (state
), not the final value. In other words, returning aAggregateFunction
The value of the type.
INSERT INTO testdb.aggregated_test_tb (`__name__`, `count`, `avg_val`, `max_val`, `time_max`, `mid_val`)
SELECT `__name__`,
countState() AS count,
avgState(val) AS avg_val,
maxState(val) AS max_val,
argMaxState(create_time, val) AS time_max,
quantilesState(0.5, 0.9)(val) AS `mid_val`
FROM testdb.test_tb
GROUP BY `__name__`, toStartOfMinute(create_time);
Attention:SELECT
fields in the statement, either by using an aggregate function call (such as the aboveval
field), or leave the original field unchanged (such as the aforementioned__name__
field), and when keeping the original field unchanged, the field must be included in theGROUP BY
clause, otherwise an error similar to the following will be reported:
SQL incorrect [215]: ClickHouse exception, code: 215, host: 192.168.88.131, port: 8123; Code: 215, () = DB::Exception: Column `__name__` is not under aggregate function and not in GROUP BY (version 20.3.5.21 (official build))
Query Data
surname CongAggregatingMergeTree
When querying data in a table, use theGROUP BY
clause and the same aggregation functions as when inserting data, but using theMerge
suffix, for example, the aggregation function used when inserting data isavgState
, then the aggregation function used for the query isavgMerge
。
postfixMerge
's aggregate function accepts a set of states, combines them together, and returns the result of the complete data aggregation.
For example, the following two queries return the same results
SELECT `__name__`,
create_time,
avgMerge(avg_val) AS avg_val,
maxMerge(max_val) AS max_val
FROM (
SELECT `__name__`,
toStartOfMinute(create_time) AS create_time,
avgState(val) AS avg_val,
maxState(val) AS max_val
FROM testdb.test_tb
GROUP BY `__name__`, create_time
)
GROUP BY `__name__`, create_time;
SELECT `__name__`,
toStartOfMinute(create_time) AS create_time,
avg(val) AS avg_val,
max(val) AS max_val
FROM testdb.test_tb
GROUP BY `__name__`, create_time;
Example:
SELECT `__name__`,
countMerge(`count`),
avgMerge(`avg_val`),
maxMerge(`max_val`),
argMaxMerge(`time_max`),
quantilesMerge(0.5, 0.9)(`mid_val`)
FROM testdb.aggregated_test_tb
GROUP BY `__name__`;
Reference Links
/docs/en/sql-reference/data-types/aggregatefunction
AggregatingMergeTree
The engine inherits fromMergeTreeThe logic of data block merging has been changed, and ClickHouse uses a single record (in a data block) storing a combination of aggregate function states to replace a record with the same primary key (or, more precisely, with the samesorting key) of all rows of
Description: A data block is the basic unit of data stored by ClickHouse
It is possible to useAggregatingMergeTree
Table for incremental data aggregation, including polymerized views.
The engine handles all columns of the following types:
-
AggregateFunction
-
SimpleAggregateFunction
If you can reduce the number of ordered lines, use the
AggregatingMergeTree
appropriate
tabulate
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = AggregatingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]
For a description of the request parameters, seeRequest description
query statement
establishAggregatingMergeTree
Table and CreationMergeTree
The clauses of the table are identical.
Queries and inserts
To insert data, use theINSERT SELECTutilizationaggregateState
function to make a query. The query from theAggregatingMergeTree
When querying data in a table, use theGROUP BY
clause and the same aggregation functions as when inserting data, but using theMerge
Suffix.
existSELECT
The result of the query.AggregateFunction
Values of type have implementation-specific binary representations for all ClickHouse output formats. For example, if you can use theSELECT
The query dumps the data asTabSeparated
format, then you can use theINSERT
The query reloads this dump.
An example of a materialized view
CREATE DATABASE testdb;
Creates a file that holds the raw data of theTable.
CREATE TABLE
(
StartDate DateTime64,
CounterID UInt64,
Sign Nullable(Int32),
UserID Nullable(Int32)
) ENGINE = MergeTree
ORDER BY (StartDate, CounterID);
Note: The aboveStartDate DateTime64,
If written asStartDate DateTime64 NOT NULL,
The run will report an error as follows:
Expected one of: CODEC, ALIAS, TTL, ClosingRoundBracket, Comma, DEFAULT, MATERIALIZED, COMMENT, token (version 20.3.5.21 (official build))
Next, create aAggregatingMergeTree
table, which will store theAggregationFunction
, which is used to track the total number of visits and the number of unique users.
Create aAggregatingMergeTree
Physical view for monitoringtable and use the
AggregateFunction
Type:
CREATE TABLE testdb.agg_visits (
StartDate DateTime64,
CounterID UInt64,
Visits AggregateFunction(sum, Nullable(Int32)),
Users AggregateFunction(uniq, Nullable(Int32))
)
ENGINE = AggregatingMergeTree() ORDER BY (StartDate, CounterID);
SQL incorrect [70]: ClickHouse exception, code: 70, host: 192.168.88.131, port: 8123; Code: 70, () = DB::Exception: Conversion from AggregateFunction(sum, Int32) to AggregateFunction(sum, Nullable(Int32)) is not supported: while converting source column Visits to destination column Visits: while pushing to view testdb.visits_mv (version 20.3.5.21 (official build))
CREATE TABLE testdb.agg_visits (
StartDate DateTime64,
CounterID UInt64,
Visits AggregateFunction(sum, Int32),
Users AggregateFunction(uniq, Int32)
)
ENGINE = AggregatingMergeTree() ORDER BY (StartDate, CounterID);
Create a materialized view from thepadding
testdb.agg_visits
:
CREATE MATERIALIZED VIEW testdb.visits_mv TO testdb.agg_visits
AS SELECT
StartDate,
CounterID,
sumState(Sign) AS Visits,
uniqState(UserID) AS Users
FROM
GROUP BY StartDate, CounterID;
Insert data into the Table.
INSERT INTO (StartDate, CounterID, Sign, UserID)
VALUES (1667446031000, 1, 3, 4), (1667446031000, 1, 6, 3);
The data is inserted simultaneously into therespond in singing
testdb.agg_visits
Center.
Execution of actions such asSELECT ... GROUP BY ...
statement to query a materialized viewtest.mv_visits
to get aggregated data
SELECT
StartDate,
sumMerge(Visits) AS Visits,
uniqMerge(Users) AS Users
FROM testdb.agg_visits
GROUP BY StartDate
ORDER BY StartDate;
Output:
StartDate |Visits|Users|
-------------------+------+-----+
2022-11-03 11:27:11| 9| 2|
existAdd another 2 records to the list, but this time try to use a different timestamp for one of them.
INSERT INTO (StartDate, CounterID, Sign, UserID)
VALUES (1669446031000, 2, 5, 10), (1667446031000, 3, 7, 5);
The query is repeated and the output is as follows:
StartDate |Visits|Users|
-------------------+------+-----+
2022-11-03 11:27:11| 16| 3|
2022-11-26 15:00:31| 5| 1|
Reference Links
/docs/en/engines/table-engines/mergetree-family/aggregatingmergetree