Text/Zhu Jiqian
I have encountered a demand that requires the statistics of data in the near N-week range based on HIVE, for example, the statistics of data indicators in the near 7-week range.
HIVE SQL needs to be used to implement this function, and HIVE SQL does not have PostgreSQL, such as the function to_char((to_date('202550', 'YYYWW') - INTERVAL '5 weeks'), 'yyyyww')) to realize the queries of 202550 and the previous 5 weeks (the 50 and 45 here represent the 50 and 45 weeks of 2025 respectively).
I couldn't find a suitable answer through Baidu and DeepSeek at that time, and I thought about it for several days before I figured out how to solve it.
Since HIVE SQL does not have a function that can directly implement conditional range query for specified week and near N weeks, is there any other way?
The answer is yes.
I created an additional timetable, which has the days and the week corresponding to the days. I can directly generate such a table date_week_table through the code, which will be automatically mapped from 2020 to 2030. The data of this table is as follows:
id | Day | Week |
---|---|---|
1 | 2020-01-01 | 202001 |
2 | 2020-01-02 | 202001 |
3 | 2020-01-03 | 202001 |
4 | 2020-01-04 | 202001 |
5 | 2020-01-05 | 202001 |
6 | 2020-01-06 | 202002 |
7 | 2020-01-07 | 202002 |
8 | 2020-01-08 | 202002 |
....... | ...... | ...... |
When there is this week table, the main table that needs to query the data in the near N-week range is composed of week fields, such as the main table commerce_data is like this.
id | order_id | customer_id | product_id | week | total_amount |
---|---|---|---|---|---|
1 | ORD202315001 | 1001 | 5001 | 202501 | 5 |
2 | ORD202315002 | 1002 | 5002 | 202452 | 10 |
3 | ORD202315003 | 1003 | 5003 | 202451 | 22 |
4 | ORD202315004 | 1004 | 5004 | 202450 | 1 |
5 | ...... | ...... | ...... | ...... | ...... |
At this time, if you need to count the data of the specified week and the previous N weeks, you can implement it based on these two tables.
For example, find out the data from the first week of 2025 nearly 4 weeks ahead.
You can calculate which weeks in the past 4 weeks in the first week of 2025 based on the date_week_table weekly table. HIVE SQL is as follows:
select distinct week from date_week_table where week <= '202501' order by week desc limit 3
Then calculate the data in the first week of 2025 based on the commerce_data main table -
select
sum(total_amount)
from commerce_data
where week in(select distinct week from date_week_table where week <= '202501' order by week desc limit 3)
group by order_id
In this way, you can query the HIVE SQL query for the specified week and the specified week for nearly N weeks.
Of course, some children's shoes may say that since they are just querying the range of nearly N weeks, can they directly use the specified week and then subtract N to make the difference?
There will be a problem here. The first week of 2025, that is, 202501, in the first two weeks, are 202452 and 202451 respectively. If you use the specified week 202501 to directly reduce 2, what you get is not 202451.
The above is just my personal idea. If there are other calculations based on HIVE SQL to calculate the specified week and the near N weeks, you can share the discussion in the message area.