⭐️ Basic Link Navigation ⭐️
Server →☁️ AliCloud event address
See sample →🐟 Groping small website address
Learning Code →💻 Source code repository address
I. Preface
We have successfully implemented a complete hot search component, from backend to frontend, to build the core functionality of this small website. Next, we will continue to improve its functionality to make it more beautiful and practical. Today's topic is how to get the hot search data regularly. If the hot search data cannot be updated regularly, the small website will lose its core value. Previously, I used@Scheduled
annotation to implement timed tasks, but this approach is not flexible enough, so I decided to replace it with the more flexible XXL-Job component.
II. xxl-job deployment
xxl-job is a lightweight distributed task scheduling platform with core design goals of rapid development, simple learning, lightweight, and easy scalability. The current github codebasestar 27.3k
, open source and free, it's worth learning to use it a bit.
1. Code base download
github code repository address
Once downloaded, the code base is structured as follows:
The source code is structured as follows:
xxl-job-admin: scheduling center
xxl-job-core: public dependencies
xxl-job-executor-samples: Sample executor samples (choose the appropriate version of the executor, you can use it directly, or refer to it and transform the existing project into an executor)
: xxl-job-executor-sample-springboot: Springboot version, through the Springboot management executor, recommended this way;
: xxl-job-executor-sample-frameless: frameless version;
Dispatch Center Configuration Content Description:
### Scheduling Center JDBC link: Keep the link address the same as the address of the scheduling database created in Section 2.1.
=jdbc:mysql://127.0.0.1:3306/xxl_job?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&serverTimezone=Asia/ Shanghai
=xxx
=xxx
-class-name=
### Alarm e-mail
=## -class-name= ### Alarm mailbox
=25
=xxx@
=xxx
=true
=true
=true
=
### Dispatch center communication TOKEN [optional]: enabled when not empty;
=
### Dispatch Center Internationalization Configuration [Required]: Default is "zh_CN"/Simplified Chinese, optional range is "zh_CN"/Simplified Chinese, "zh_TC"/Traditional Chinese and "en"/English;
.i18n=zh_CN
## Scheduling Thread Pool Maximum Thread Configuration [Required
=200
=100
### Scheduling center log table data retention days [Required]: expired logs are automatically cleaned up; limit greater than or equal to 7 to take effect, otherwise, such as -1, turn off the automatic cleaning function;
=30
2. Table structure initialization
In the db directory of the doc directory, there is a sql file with some initialization sql for tables and data that we want to have ready before executing XXL-Job.
At the end of the execution, the table is as follows:
3. Launch of XXL-Job
locateXxlJobAdminApplication
, launch the app and type in your browser:http://localhost:12000/xxl-job-admin/toLogin, it will enter the XXL-Job login screen as follows:
Enter the user name:admin
; Passwords:123456
Click Login to enter the main screen as follows:
III. Customizing Crawler Tasks
XXL-Job is also very simple to use, a single annotation is all it takes, here I'll talk about how to use it.
1. Introduce XXL-Job dependencies
existsummo-sbmy-job
The under add:
<!-- xxl-job -->
<dependency>
<groupId></groupId>
<artifactId>xxl-job-core</artifactId>
<version>2.4.1</version>
</dependency>
2. XXL-Job configuration
Add the XXL-Job configuration to the file with the following configuration:
# xxl-job
=true
### xxl-job admin address list, such as "http://address" or "http://address01,http://address02"
=http://127.0.0.1:12000/xxl-job-admin
### xxl-job, access token
=default_token
### xxl-job executor appname
=summo-sbmy
### xxl-job executor log-path
=/root/logs/xxl-job/jobhandler
### xxl-job executor log-retention-days
=30
### xxl-job executor registry-address: default use address to registry , otherwise use ip:port if address is null
=
### xxl-job executor server-info
=
=9999
After the configuration is done, create a config file in the directory, create, the code is as follows:
package ;
import ;
import org.;
import org.;
import ;
import ;
import ;
import ;
/**
* xxl-job config
*
* @author xuxueli 2017-04-28
*/
@Configuration
public class XxlJobConfig {
private Logger logger = ();
@Value("${}")
private String adminAddresses;
@Value("${}")
private String accessToken;
@Value("${}")
private String appname;
@Value("${}")
private String address;
@Value("${}")
private String ip;
@Value("${}")
private int port;
@Value("${}")
private String logPath;
@Value("${}")
private int logRetentionDays;
@Bean
@ConditionalOnProperty(name = "", havingValue = "true")
public XxlJobSpringExecutor xxlJobExecutor() {
(">>>>>>>>>>> xxl-job config init.");
XxlJobSpringExecutor xxlJobSpringExecutor = new XxlJobSpringExecutor();
(adminAddresses);
(appname);
(address);
(ip);
(port);
(accessToken);
(logPath);
(logRetentionDays);
return xxlJobSpringExecutor;
}
}
After the configuration and classes are done, restart the application, and if it goes well, you will see that an executor has been registered on the XXL-Job Manager's executor screen as follows:
4. Registration of XXL-Job tasks
In the case of the Jitterbug hot search, for example, we started out using the@Scheduled
annotation, the code is as follows:
/**
* Timed trigger crawler method, executed once every 1 hour
*/
@Scheduled(fixedRate = 1000 * 60 * 60)
public void hotSearch() throws IOException{
... ...
}
commander-in-chief (military)@Scheduled
annotation is replaced with the@XxlJob("douyinHotSearchJob")
The specific code is as follows:
package ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import .slf4j.Slf4j;
import ;
import ;
import ;
import .;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import static .CACHE_MAP;
import static ;
/**
* @author summo
* @version , 1.0.0
* @description Shake Shack hot searchJavacrawler code
* @date 2024surname Nian08moon09
*/
@Component
@Slf4j
public class DouyinHotSearchJob {
@Autowired
private SbmyHotSearchService sbmyHotSearchService;
@XxlJob("douyinHotSearchJob")
public ReturnT<String> hotSearch(String param) throws IOException {
("Shake Shack hot search爬虫任务开始");
try {
//查询Shake Shack hot search数据
OkHttpClient client = new OkHttpClient().newBuilder().build();
Request request = new ().url("/web/api/v2/hotsearch/billboard/word/").method("GET", null).build();
Response response = (request).execute();
JSONObject jsonObject = (().string());
JSONArray array = ("word_list");
List<SbmyHotSearchDO> sbmyHotSearchDOList = ();
for (int i = 0, len = (); i < len; i++) {
//Get the information about the hot searches of Zhihu
JSONObject object = (JSONObject) (i);
//Build a hot search information list
SbmyHotSearchDO sbmyHotSearchDO = ().hotSearchResource(()).build();
//Setting the article title
(("word"));
//Setting up the Knowledge TripartiteID
(getHashId(() + ()));
//Setting up article links
("/search/" + () + "?type=general");
//Setting the heat of a hot search
(("hot_value"));
//ordinal order
(i + 1);
(sbmyHotSearchDO);
}
if ((sbmyHotSearchDOList)) {
return ;
}
//Data added to cache
CACHE_MAP.put((), ().map(HotSearchConvert::toDTOWhenQuery).collect(()));
//Data persistence
sbmyHotSearchService.saveCache2DB(sbmyHotSearchDOList);
("Shake Shack hot search爬虫任务结束");
} catch (IOException e) {
("Get Jitterbug Data Exception", e);
}
return ;
}
/**
* Gets a unique value based on the title of the articleID
*
* @param title article title
* @return uniqueID
*/
private String getHashId(String title) {
long seed = ();
Random rnd = new Random(seed);
return new UUID((), ()).toString();
}
}
Click Add Task in the Task Management screen of the XXL-Job Management Console as follows:
After creating the task, we can run it once manually as follows:
In this way we have configured the hot search task for Jitterbug, and the other crawler tasks are configured in the same way.
IV. Hot search update time
Currently we have implemented three hot search components, Baidu, Shake, Zhihu, but we do not know when these hot searches are updated, and do not know if it is real-time, so we need to put the hot search update time, about the following this way:
The optimized component code is as follows:
<template>
<el-card class="custom-card" v-loading="loading">
<template #header>
<div class="card-title">
<img :src="icon" class="card-title-icon" />
{{ title }}hot list
<span class="update-time">{{ formattedUpdateTime }}</span>
</div>
</template>
<div class="cell-group-scrollable">
<div
v-for="item in hotSearchData"
:key=""
:class="getRankingClass()"
class="cell-wrapper"
>
<span class="cell-order">{{ }}</span>
<span
class="cell-title hover-effect"
@click="openLink()"
>
{{ }}
</span>
<span class="cell-heat">{{ formatHeat() }}</span>
</div>
</div>
</el-card>
</template>
<script>
import apiService from "@/config/";
export default {
props: {
title: String,
icon: String,
type: String,
},
data() {
return {
hotSearchData: [],
updateTime: null,
loading: false,
};
},
created() {
();
},
computed: {
formattedUpdateTime() {
if (!) return '';
const updateDate = new Date();
const now = new Date();
const timeDiff = now - updateDate;
const minutesDiff = (timeDiff / 1000 / 60);
if (minutesDiff < 1) {
return 'Just updated';
} else if (minutesDiff < 60) {
return `${minutesDiff}Updated minutes ago`;
} else if (minutesDiff < 1440) {
return `${(minutesDiff / 60)}Updated hours ago`;
} else {
return ();
}
},
},
methods: {
fetchData(type) {
= true;
apiService
.get("/hotSearch/queryByType?type=" + type)
.then((res) => {
= ;
= ;
})
.catch((error) => {
(error);
})
.finally(() => {
= false;
});
},
getRankingClass(order) {
if (order === 1) return "top-ranking-1";
if (order === 2) return "top-ranking-2";
if (order === 3) return "top-ranking-3";
return "";
},
formatHeat(heat) {
if (typeof heat === "string" && ("ten thousand")) {
return heat;
}
let number = parseFloat(heat);
if (isNaN(number)) {
return heat;
}
if (number < 1000) {
return ();
}
if (number >= 1000 && number < 10000) {
return (number / 1000).toFixed(1) + "k";
}
if (number >= 10000) {
return (number / 10000).toFixed(1) + "ten thousand";
}
},
openLink(url) {
if (url) {
(url, "_blank");
}
},
},
};
</script>
<style scoped>
.custom-card {
background-color: #ffffff;
border-radius: 10px;
box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
margin-bottom: 20px;
}
.custom-card:hover {
box-shadow: 0 6px 8px rgba(0, 0, 0, 0.25);
}
.el-card__header {
padding: 10px 18px;
display: flex;
justify-content: space-between; /* Added to space out title and update time */
align-items: center;
}
.card-title {
display: flex;
align-items: center;
font-weight: bold;
font-size: 16px;
flex-grow: 1;
}
.card-title-icon {
fill: currentColor;
width: 24px;
height: 24px;
margin-right: 8px;
}
.update-time {
font-size: 12px;
color: #b7b3b3;
margin-left: auto; /* Ensures it is pushed to the far right */
}
.cell-group-scrollable {
max-height: 350px;
overflow-y: auto;
padding-right: 16px;
flex: 1;
}
.cell-wrapper {
display: flex;
align-items: center;
padding: 8px 8px;
border-bottom: 1px solid #e8e8e8;
}
.cell-order {
width: 20px;
text-align: left;
font-size: 16px;
font-weight: 700;
margin-right: 8px;
color: #7a7a7a;
}
.cell-heat {
min-width: 50px;
text-align: right;
font-size: 12px;
color: #7a7a7a;
}
.cell-title {
font-size: 13px;
color: #495060;
line-height: 22px;
flex-grow: 1;
overflow: hidden;
text-align: left;
text-overflow: ellipsis;
}
.top-ranking-1 .cell-order {
color: #fadb14; /* gold (color) */
}
.top-ranking-2 .cell-order {
color: #a9a9a9; /* silver (color) */
}
.top-ranking-3 .cell-order {
color: #d48806; /* copper color */
}
.-effect {
cursor: pointer;
transition: color 0.3s ease;
}
.-effect:hover {
color: #409eff;
}
</style>
After optimization, let's look at the final style, as follows:
In this way, we use XXL-Job to transform the hot search component is completed, the detailed code can go to see my code repository.
Extra: B Station Hot Search Crawler
1. Evaluation of the reptile program
B-site isn't a hot search, it's a popular video, but the logic is the same, it's an interface:/x/web-interface/ranking/v2
This interface returns data in JSON format, which is very simple, just look at the structure.
2. Web page parsing code
This will be able to use Postman to generate the calling code, the process I will not repeat, directly on the code, BilibiliHotSearchJob:
package ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import .slf4j.Slf4j;
import ;
import ;
import ;
import .;
import ;
import ;
import static .CACHE_MAP;
import static ;
/**
* @author summo
* @version , 1.0.0
* @description BStation Hot ListJavacrawler code
* @date 2024surname Nian08moon19
*/
@Component
@Slf4j
public class BilibiliHotSearchJob {
@Autowired
private SbmyHotSearchService sbmyHotSearchService;
@XxlJob("bilibiliHotSearchJob")
public ReturnT<String> hotSearch(String param) throws IOException {
("BStation Hot Search Crawler Mission Begins");
try {
//consult (a document etc)BStation Hot Search Data
OkHttpClient client = new OkHttpClient().newBuilder().build();
Request request = new ().url("/x/web-interface/ranking/v2")
.addHeader("User-Agent", "Mozilla/5.0 (compatible)").addHeader("Cookie", "b_nut=1712137652; "
+ "buvid3=DBA9C433-8738-DD67-DCF5" + "-DDC780CA892052512infoc").method("GET", null).build();
Response response = (request).execute();
JSONObject jsonObject = (().string());
JSONArray array = ("data").getJSONArray("list");
List<SbmyHotSearchDO> sbmyHotSearchDOList = ();
for (int i = 0, len = (); i < len; i++) {
//gainBStation hot search information
JSONObject object = (JSONObject)(i);
//Build a hot search information list
SbmyHotSearchDO sbmyHotSearchDO = ().hotSearchResource(())
.build();
//set upBtripartiteID
(("aid"));
//set up文章连接
(("short_link_v2"));
//set up文章标题
(("title"));
//set up作者名称
(("owner").getString("name"));
//set up作者头像
(("owner").getString("face"));
//set up文章封面
(("pic"));
//set up热搜热度
(("stat").getString("view"));
//ordinal order
(i + 1);
(sbmyHotSearchDO);
}
if ((sbmyHotSearchDOList)) {
return ;
}
//Data added to cache
CACHE_MAP.put((), ()
//Hot Search Data
.hotSearchDTOList(
().map(HotSearchConvert::toDTOWhenQuery).collect(()))
//update time
.updateTime(().getTime()).build());
//Data persistence
sbmyHotSearchService.saveCache2DB(sbmyHotSearchDOList);
("BEnd of Station Hot Search Crawler Mission");
} catch (IOException e) {
("gainBStation data anomalies", e);
}
return ;
}
}
Look at the result, the 4 hot searches in the first row have come out as follows: