Location>code7788 >text

Unlocking Time Series Data Features for SQL Server 2022

Popularity:279 ℃/2024-07-27 15:27:59

Unlocking Time Series Data Features for SQL Server 2022

  

SQL Server2022 When working with time-series data, SQL Server provides a number of optimizations and features, such as the DATE_BUCKET function, window functions (such as FIRST_VALUE and LAST_VALUE), and other time-date functions, to process time-series data more efficiently.

 


 

 GENERATE_SERIES function

SQL Server 2022 introduces a new function, GENERATE_SERIES, which is used to generate a sequence of integers.
This function is very useful to generate a series of consecutive values in a query without having to create temporary tables or loops.

GENERATE_SERIES ( start, stop [, step ] )
start: start value of the sequence.
stop: the end value of the sequence.
step: the step size of each increment or decrement (optional). If omitted, the default is 1.

Usage scenarios include quickly generating a series of data for testing or populating a table or combining date functions to generate a series of date values.

 

typical example

The generated result set will contain 20 rows, each showing the time in minute increments starting at '2019-02-28 13:45:23'.

SELECT DATEADD(MINUTE, , '2019-02-28 13:45:23') AS [Interval]
FROM GENERATE_SERIES(0, 20, 1) AS s;

For each , the DATEADD function adds the corresponding number of minutes to the base date time.


DATE_BUCKET function

SQL Server 2022 introduces a new function, DATE_BUCKET, for grouping datetime values at specified intervals (i.e., bucketing).

This function is useful in scenarios such as time series analysis, data aggregation and segmentation analysis.

DATE_BUCKET ( bucket_width, datepart, startdate, date )
bucket_width: the size of the time interval, can be an integer.
datepart:Types of time intervals,for example year, month, dayThe date is the start date of the interval, hour, minute, second, etc.
startdate: start date, used to define the starting point of the time interval.
date: the date and time to be grouped.


When using the DATE_BUCKET function, the unit of the interval specified (e.g., YEAR, QUARTER, MONTH, WEEK, etc.) and the start date (origin) determine which bucket the datetime value is assigned to. This approach helps to understand how the calculation of the time interval is based on the origin date.

 

typical example

DECLARE @date DATETIME = '2019-09-28 13:45:23';
DECLARE @origin DATETIME = '2019-01-28 13:45:23';
 
SELECT 'Now' AS [BucketName], @date AS [DateBucketValue]
UNION ALL
SELECT 'Year', DATE_BUCKET (YEAR, 1, @date, @origin)
UNION ALL
SELECT 'Quarter', DATE_BUCKET (QUARTER, 1, @date, @origin)
UNION ALL
SELECT 'Month', DATE_BUCKET (MONTH, 1, @date, @origin)
UNION ALL
SELECT 'Week', DATE_BUCKET (WEEK, 1, @date, @origin)


--If the date and time values are as follows.
Now.2019-09-28 13:45:23


--Grouped by year.
DATE_BUCKET(YEAR, 1, @date, @origin)
through (a gap)2019-01-28 13:45:23 Annual storage buckets started,2019-09-28 fall into 2019-01-28 until 2020-01-28 in the storage bucket.
Results:2019-01-28 13:45:23


--Grouped by Quarter:.
DATE_BUCKET(QUARTER.1, @date, @origin)
through (a gap)2019-01-28 13:45:23 Beginning of quarterly storage buckets,each quarter 3 Months.
2019-09-28 Falling into the third quarterly storage bucket(namely, from 2019-07-28 13:45:23 until (a time) 2019-10-28 13:45:23).
Results:2019-07-28 13:45:23



--Grouped by month.
DATE_BUCKET(MONTH, 1, @date, @origin)
through (a gap)2019-01-28 13:45:23 Started monthly storage buckets, one storage bucket per month.
2019-09-28 Drop into the ninth storage bucket(namely, from 2019-09-28 13:45:23 until (a time) 2019-10-28 13:45:23).
Results:2019-09-28 13:45:23



--Grouped by week.
DATE_BUCKET(WEEK.1, @date, @origin)
through (a gap)2019-01-28 13:45:23 Beginning of weekly storage buckets.
2019-09-28 fall into 2019-09-23 13:45:23 until (a time) 2019-09-30 13:45:23 of storage buckets.
Results:2019-09-23 13:45:23

SELECT 'Now' AS [BucketName], GETDATE() AS [BucketDate]
UNION ALL
SELECT '5 Minute Buckets', DATE_BUCKET (MINUTE, 5, GETDATE())
UNION ALL
SELECT 'Quarter Hour', DATE_BUCKET (MINUTE, 15, GETDATE());



Now:
BucketName: Now
BucketDate: 2024-07-26 16:14:11.030
This is the current time, theGETDATE() The current time of the system returned.


5 Minute Buckets:
BucketName: 5 Minute Buckets
BucketDate: 2024-07-26 16:10:00.000
This is the result of pressing the current time into the5 Results of grouping at minute intervals。DATE_BUCKET(MINUTE, 5, GETDATE()) Returns the current time in the 5 Start of the minute interval。In this example,16:14:11 land on 16:10:00 until (a time) 16:15:00 among,Thus returning 16:10:00。



Quarter Hour:
BucketName: Quarter Hour
BucketDate: 2024-07-26 16:00:00.000
This is the result of pressing the current time into the15 Results of grouping at minute intervals。DATE_BUCKET(MINUTE, 15, GETDATE()) Returns the current time in the 15 Start of the minute interval。In this example,16:14:11 land on 16:00:00 until (a time) 16:15:00 among,Thus returning 16:00:00

 

 

More examples of real-world scenarios

 

Grouping by customized start date
Suppose we have a series of event times EventTime and want to count the number of events in groups per week starting with the date '2023-01-01'.

--Create the table Events:

USE [testdb]
GO

CREATE TABLE Events (
    EventID INT PRIMARY KEY,
    EventTime DATETIME
);

INSERT INTO Events (EventID, EventTime) VALUES
(1, '2023-01-02 14:30:00'),
(2, '2023-01-08 09:15:00'),
(3, '2023-01-09 17:45:00'),
(4, '2023-01-15 12:00:00'),
(5, '2023-01-16 08:00:00'),
(6, '2023-01-22 19:30:00'),
(7, '2023-01-29 11:00:00');
    
  
--Starting with the '2023-01-01' start date, the number of events is grouped and counted weekly.
DECLARE @origin DATETIME = '2023-01-01';

SELECT
    DATE_BUCKET(WEEK, 1, EventTime, @origin) AS WeekStart,
    COUNT(*) AS EventCount
FROM
    Events
GROUP BY
    DATE_BUCKET(WEEK, 1, EventTime, @origin)
ORDER BY
    WeekStart;

 

 

 


Grouping at customized intervals
Suppose we have a sensor data table SensorReadings

USE [testdb]
GO


CREATE TABLE SensorReadings (
    ReadingID INT PRIMARY KEY,  --unique identification
    ReadingTime DATETIME,  --Time to read
    Value FLOAT  --Value of the reading
);

INSERT INTO SensorReadings (ReadingID, ReadingTime, Value) VALUES
(1, '2023-07-26 10:03:00', 23.5),
(2, '2023-07-26 10:05:00', 24.1),
(3, '2023-07-26 10:09:00', 22.8),
(4, '2023-07-26 10:15:00', 25.0),
(5, '2023-07-26 10:20:00', 23.9),
(6, '2023-07-26 10:27:00', 24.3),
(7, '2023-07-26 10:29:00', 24.5);


--We want to group the data in 10-minute intervals and calculate the average reading for each interval.
SELECT
    DATE_BUCKET(MINUTE, 10, ReadingTime) AS BucketStartTime,
    ROUND(AVG(Value),4) AS AverageValue
FROM
    SensorReadings
GROUP BY
    DATE_BUCKET(MINUTE, 10, ReadingTime)
ORDER BY
    BucketStartTime;

 

 

The traditional approach would require the use of a Common Table Expression CTE to accomplish this.

    
--Query: Grouping at 10-minute intervals and calculating averages
WITH TimeIntervals AS (
    SELECT
        ReadingID,
        ReadingTime,
        Value,
        --How many 10-minute intervals are there from 2010 to the present, by rounding the number of minutes to the nearest integer multiple of 10 minutes?
        DATEADD(MINUTE, (DATEDIFF(MINUTE, '2000-01-01', ReadingTime) / 10) * 10, '2010-01-01') AS BucketStartTime  
    FROM
        SensorReadings
)
SELECT
    BucketStartTime,
    ROUND(AVG(Value), 4) AS AverageValue
FROM
    TimeIntervals
GROUP BY
    BucketStartTime
ORDER BY
    BucketStartTime;
    


WITH TimeIntervals AS (...) The Common Table Expression (CTE) is used to calculate the BucketStartTime for each record.
DATEDIFF(MINUTE, '2000-01-01', ReadingTime) / 10 Calculates the number of minutes from ReadingTime to the base time of '2000-01-01' and divides by 10 to get the index of the 10-minute interval in which the current point in time falls.
DATEADD(MINUTE, ... , '2000-01-01') Converts this index back to a specific point in time, the start of the interval.


Query Main Section.
Select BucketStartTime and the average of the readings in the corresponding interval.
Use GROUP BY to group by BucketStartTime and calculate the average for each group.
ORDER BY is used to sort the results in chronological order.


FIRST_VALUE and LAST_VALUE Window Functions

In previous versions of SQL Server, FIRST_VALUE and LAST_VALUE were window functions used to return the first or last value in a partition or window.

SQL Server 2022 enhances these functions by introducing new options IGNORE NULLS and RESPECT NULLS to handle null values (NULL).

 

basic grammar

FIRST_VALUE
Returns the first value in the specified window or partition in the specified order.
FIRST_VALUE ([scalar_expression ] ) 
OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )



LAST_VALUE
Returns the last value in the specified window or partition in the specified order.
LAST_VALUE ([scalar_expression ] ) 
OVER ( [ partition_by_clause ] order_by_clause [ rows_range_clause ] )



New: IGNORE NULLS and RESPECT NULLS
IGNORE NULLS: Ignore theNULL values.
RESPECT NULLS: The default behavior, containing the partition or window in theNULL (be) worth。

 

typical example

 

Suppose we have a table MachineTelemetry that contains the following data:

CREATE TABLE MachineTelemetry (
    [timestamp] DATETIME,
    SensorReading FLOAT
);


INSERT INTO MachineTelemetry ([timestamp], SensorReading) VALUES
('2023-07-26 10:00:00', 23.5),
('2023-07-26 10:00:15', 24.1),
('2023-07-26 10:00:30', NULL),
('2023-07-26 10:00:45', 25.0),
('2023-07-26 10:01:00', NULL),
('2023-07-26 10:01:15', 23.9),
('2023-07-26 10:01:30', NULL),
('2023-07-26 10:01:45', 24.3);


Default behavior (contains NULL values)

--Gap analysis using FIRST_VALUE and LAST_VALUE
--Default behavior (contains NULL values)
SELECT 
    [timestamp],
    DATE_BUCKET(MINUTE, 1, [timestamp]) AS [timestamp_bucket],
    SensorReading,
    FIRST_VALUE(SensorReading) OVER (
        PARTITION BY DATE_BUCKET(MINUTE, 1, [timestamp]) 
        ORDER BY [timestamp] 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS [Default_FIRST_VALUE (RESPECT NULLS)],
    LAST_VALUE(SensorReading) OVER (
        PARTITION BY DATE_BUCKET(MINUTE, 1, [timestamp]) 
        ORDER BY [timestamp] 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS [Default_LAST_VALUE (RESPECT NULLS)]
FROM MachineTelemetry
ORDER BY [timestamp];

 

 

 

Ignore NULL values

--Ignore NULL values
SELECT 
    [timestamp],
    DATE_BUCKET(MINUTE, 1, [timestamp]) AS [timestamp_bucket],
    SensorReading,
    FIRST_VALUE(SensorReading) IGNORE NULLS OVER (
        PARTITION BY DATE_BUCKET(MINUTE, 1, [timestamp]) 
        ORDER BY [timestamp] 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS [First_Reading (IGNORE NULLS)],
    LAST_VALUE(SensorReading) IGNORE NULLS OVER (
        PARTITION BY DATE_BUCKET(MINUTE, 1, [timestamp]) 
        ORDER BY [timestamp] 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS [Last_Reading (IGNORE NULLS)]
FROM MachineTelemetry
ORDER BY [timestamp];

 


 summarize

In fact, for time series we generally use specialized time series databases such as InfluxDB .

It uses TSM (Time-Structured Merge Tree) as the storage engine, which is a variant of the LSM tree specifically optimized for write and query performance for time-series data.

In addition, SQL Server's time series functionality uses the Row Store as its storage engine, which means that data is stored and processed on a row-by-row basis.

 

In most scenarios, if the performance is not very high, in fact, SQL Server storage time series data performance is completely sufficient, and additional use of InfluxDB database needs to maintain an additional technology stack, the operation and maintenance requirements are even higher.

Especially now the pursuit of database integration trend in the background, whether it is time series data, vector data, geographic data, json data are best in a database to meet all, reduce the burden of operation and maintenance, reuse the technology stack, reduce the duplication of construction costs is a better solution.

 

 

reference article

/sessions/event2024/Time_Series_with_SQL_Server_2022

/en-us/sql-server/blog/2023/01/12/working-with-time-series-data-in-sql-server-2022-and-azure-sql/

/sqlservertip/6232/load-time-series-data-with-sql-server/

 

 
This article is copyrighted and may not be reproduced without the author's permission.