Location>code7788 >text

Spend 100 bucks to make a small fishing site! Part 5 - Getting Hot Searches on a Timed Basis with xxl-job

Popularity:212 ℃/2024-09-02 11:36:52

⭐️ Basic Link Navigation ⭐️

Server →☁️ AliCloud event address

See sample →🐟 Groping small website address

Learning Code →💻 Source code repository address

I. Preface

We have successfully implemented a complete hot search component, from backend to frontend, to build the core functionality of this small website. Next, we will continue to improve its functionality to make it more beautiful and practical. Today's topic is how to get the hot search data regularly. If the hot search data cannot be updated regularly, the small website will lose its core value. Previously, I used@Scheduledannotation to implement timed tasks, but this approach is not flexible enough, so I decided to replace it with the more flexible XXL-Job component.

II. xxl-job deployment

xxl-job is a lightweight distributed task scheduling platform with core design goals of rapid development, simple learning, lightweight, and easy scalability. The current github codebasestar 27.3k, open source and free, it's worth learning to use it a bit.

1. Code base download

github code repository address

Once downloaded, the code base is structured as follows:

The source code is structured as follows:

xxl-job-admin: scheduling center
xxl-job-core: public dependencies
xxl-job-executor-samples: Sample executor samples (choose the appropriate version of the executor, you can use it directly, or refer to it and transform the existing project into an executor)
    : xxl-job-executor-sample-springboot: Springboot version, through the Springboot management executor, recommended this way;
    : xxl-job-executor-sample-frameless: frameless version;

Dispatch Center Configuration Content Description:

### Scheduling Center JDBC link: Keep the link address the same as the address of the scheduling database created in Section 2.1.
=jdbc:mysql://127.0.0.1:3306/xxl_job?useUnicode=true&characterEncoding=UTF-8&autoReconnect=true&serverTimezone=Asia/ Shanghai
=xxx
=xxx
-class-name=

### Alarm e-mail
=## -class-name= ### Alarm mailbox
=25
=xxx@
=xxx
=true
=true
=true
=

### Dispatch center communication TOKEN [optional]: enabled when not empty;
=

### Dispatch Center Internationalization Configuration [Required]: Default is "zh_CN"/Simplified Chinese, optional range is "zh_CN"/Simplified Chinese, "zh_TC"/Traditional Chinese and "en"/English;
.i18n=zh_CN

## Scheduling Thread Pool Maximum Thread Configuration [Required
=200
=100

### Scheduling center log table data retention days [Required]: expired logs are automatically cleaned up; limit greater than or equal to 7 to take effect, otherwise, such as -1, turn off the automatic cleaning function;
=30

2. Table structure initialization

In the db directory of the doc directory, there is a sql file with some initialization sql for tables and data that we want to have ready before executing XXL-Job.

At the end of the execution, the table is as follows:

3. Launch of XXL-Job

locateXxlJobAdminApplication, launch the app and type in your browser:http://localhost:12000/xxl-job-admin/toLogin, it will enter the XXL-Job login screen as follows:

Enter the user name:admin; Passwords:123456Click Login to enter the main screen as follows:

III. Customizing Crawler Tasks

XXL-Job is also very simple to use, a single annotation is all it takes, here I'll talk about how to use it.

1. Introduce XXL-Job dependencies

existsummo-sbmy-jobThe under add:

<!-- xxl-job -->
<dependency>
  <groupId></groupId>
  <artifactId>xxl-job-core</artifactId>
  <version>2.4.1</version>
</dependency>

2. XXL-Job configuration

Add the XXL-Job configuration to the file with the following configuration:

# xxl-job
=true
### xxl-job admin address list, such as "http://address" or "http://address01,http://address02"
=http://127.0.0.1:12000/xxl-job-admin
### xxl-job, access token
=default_token
### xxl-job executor appname
=summo-sbmy
### xxl-job executor log-path
=/root/logs/xxl-job/jobhandler
### xxl-job executor log-retention-days
=30
### xxl-job executor registry-address: default use address to registry , otherwise use ip:port if address is null
=
### xxl-job executor server-info
=
=9999

After the configuration is done, create a config file in the directory, create, the code is as follows:

package ;

import ;
import org.;
import org.;
import ;
import ;
import ;
import ;

/**
 * xxl-job config
 *
 * @author xuxueli 2017-04-28
 */
@Configuration
public class XxlJobConfig {
    private Logger logger = ();

    @Value("${}")
    private String adminAddresses;

    @Value("${}")
    private String accessToken;

    @Value("${}")
    private String appname;

    @Value("${}")
    private String address;

    @Value("${}")
    private String ip;

    @Value("${}")
    private int port;

    @Value("${}")
    private String logPath;

    @Value("${}")
    private int logRetentionDays;

    @Bean
    @ConditionalOnProperty(name = "", havingValue = "true")
    public XxlJobSpringExecutor xxlJobExecutor() {
        (">>>>>>>>>>> xxl-job config init.");
        XxlJobSpringExecutor xxlJobSpringExecutor = new XxlJobSpringExecutor();
        (adminAddresses);
        (appname);
        (address);
        (ip);
        (port);
        (accessToken);
        (logPath);
        (logRetentionDays);

        return xxlJobSpringExecutor;
    }
}

After the configuration and classes are done, restart the application, and if it goes well, you will see that an executor has been registered on the XXL-Job Manager's executor screen as follows:

4. Registration of XXL-Job tasks

In the case of the Jitterbug hot search, for example, we started out using the@Scheduledannotation, the code is as follows:

/**
  * Timed trigger crawler method, executed once every 1 hour
  */
@Scheduled(fixedRate = 1000 * 60 * 60)
public void hotSearch() throws IOException{
  ... ...
}

commander-in-chief (military)@Scheduledannotation is replaced with the@XxlJob("douyinHotSearchJob")The specific code is as follows:

package ;

import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import .slf4j.Slf4j;
import ;
import ;
import ;
import .;
import ;
import ;
import ;

import ;
import ;
import ;
import ;
import ;

import static .CACHE_MAP;
import static ;

/**
 * @author summo
 * @version , 1.0.0
 * @description Shake Shack hot searchJavacrawler code
 * @date 2024surname Nian08moon09
 */
@Component
@Slf4j
public class DouyinHotSearchJob {

    @Autowired
    private SbmyHotSearchService sbmyHotSearchService;

    @XxlJob("douyinHotSearchJob")
    public ReturnT<String> hotSearch(String param) throws IOException {
        ("Shake Shack hot search爬虫任务开始");
        try {
            //查询Shake Shack hot search数据
            OkHttpClient client = new OkHttpClient().newBuilder().build();
            Request request = new ().url("/web/api/v2/hotsearch/billboard/word/").method("GET", null).build();
            Response response = (request).execute();
            JSONObject jsonObject = (().string());
            JSONArray array = ("word_list");
            List<SbmyHotSearchDO> sbmyHotSearchDOList = ();
            for (int i = 0, len = (); i < len; i++) {
                //Get the information about the hot searches of Zhihu
                JSONObject object = (JSONObject) (i);
                //Build a hot search information list
                SbmyHotSearchDO sbmyHotSearchDO = ().hotSearchResource(()).build();
                //Setting the article title
                (("word"));
                //Setting up the Knowledge TripartiteID
                (getHashId(() + ()));
                //Setting up article links
                ("/search/" + () + "?type=general");
                //Setting the heat of a hot search
                (("hot_value"));
                //ordinal order
                (i + 1);
                (sbmyHotSearchDO);
            }
            if ((sbmyHotSearchDOList)) {
                return ;
            }
            //Data added to cache
            CACHE_MAP.put((), ().map(HotSearchConvert::toDTOWhenQuery).collect(()));

            //Data persistence
            sbmyHotSearchService.saveCache2DB(sbmyHotSearchDOList);
            ("Shake Shack hot search爬虫任务结束");
        } catch (IOException e) {
            ("Get Jitterbug Data Exception", e);
        }
        return ;
    }

    /**
     * Gets a unique value based on the title of the articleID
     *
     * @param title article title
     * @return uniqueID
     */
    private String getHashId(String title) {
        long seed = ();
        Random rnd = new Random(seed);
        return new UUID((), ()).toString();
    }

}

Click Add Task in the Task Management screen of the XXL-Job Management Console as follows:

After creating the task, we can run it once manually as follows:

In this way we have configured the hot search task for Jitterbug, and the other crawler tasks are configured in the same way.

IV. Hot search update time

Currently we have implemented three hot search components, Baidu, Shake, Zhihu, but we do not know when these hot searches are updated, and do not know if it is real-time, so we need to put the hot search update time, about the following this way:


The optimized component code is as follows:

<template>
  <el-card class="custom-card" v-loading="loading">
    <template #header>
      <div class="card-title">
        <img :src="icon" class="card-title-icon" />
        {{ title }}hot list
        <span class="update-time">{{ formattedUpdateTime }}</span>
      </div>
    </template>
    <div class="cell-group-scrollable">
      <div
        v-for="item in hotSearchData"
        :key=""
        :class="getRankingClass()"
        class="cell-wrapper"
      >
        <span class="cell-order">{{ }}</span>
        <span
          class="cell-title hover-effect"
          @click="openLink()"
        >
          {{ }}
        </span>
        <span class="cell-heat">{{ formatHeat() }}</span>
      </div>
    </div>
  </el-card>
</template>

<script>
import apiService from "@/config/";

export default {
  props: {
    title: String,
    icon: String,
    type: String,
  },
  data() {
    return {
      hotSearchData: [],
      updateTime: null,
      loading: false,
    };
  },
  created() {
    ();
  },
  computed: {
    formattedUpdateTime() {
      if (!) return '';

      const updateDate = new Date();
      const now = new Date();
      
      const timeDiff = now - updateDate;
      const minutesDiff = (timeDiff / 1000 / 60);

      if (minutesDiff < 1) {
        return 'Just updated';
      } else if (minutesDiff < 60) {
        return `${minutesDiff}Updated minutes ago`;
      } else if (minutesDiff < 1440) {
        return `${(minutesDiff / 60)}Updated hours ago`;
      } else {
        return ();
      }
    },
  },
  methods: {
    fetchData(type) {
       = true;
      apiService
        .get("/hotSearch/queryByType?type=" + type)
        .then((res) => {
           = ;
           = ;
        })
        .catch((error) => {
          (error);
        })
        .finally(() => {
           = false;
        });
    },
    getRankingClass(order) {
      if (order === 1) return "top-ranking-1";
      if (order === 2) return "top-ranking-2";
      if (order === 3) return "top-ranking-3";
      return "";
    },
    formatHeat(heat) {
      if (typeof heat === "string" && ("ten thousand")) {
        return heat;
      }
      let number = parseFloat(heat);
      if (isNaN(number)) {
        return heat;
      }
      if (number < 1000) {
        return ();
      }
      if (number >= 1000 && number < 10000) {
        return (number / 1000).toFixed(1) + "k";
      }
      if (number >= 10000) {
        return (number / 10000).toFixed(1) + "ten thousand";
      }
    },
    openLink(url) {
      if (url) {
        (url, "_blank");
      }
    },
  },
};
</script>

<style scoped>
.custom-card {
  background-color: #ffffff;
  border-radius: 10px;
  box-shadow: 0 4px 8px rgba(0, 0, 0, 0.1);
  margin-bottom: 20px;
}
.custom-card:hover {
  box-shadow: 0 6px 8px rgba(0, 0, 0, 0.25);
}
.el-card__header {
  padding: 10px 18px;
  display: flex;
  justify-content: space-between; /* Added to space out title and update time */
  align-items: center;
}
.card-title {
  display: flex;
  align-items: center;
  font-weight: bold;
  font-size: 16px;
  flex-grow: 1;
}
.card-title-icon {
  fill: currentColor;
  width: 24px;
  height: 24px;
  margin-right: 8px;
}
.update-time {
  font-size: 12px;
  color: #b7b3b3;
  margin-left: auto; /* Ensures it is pushed to the far right */
}
.cell-group-scrollable {
  max-height: 350px;
  overflow-y: auto;
  padding-right: 16px;
  flex: 1;
}
.cell-wrapper {
  display: flex;
  align-items: center;
  padding: 8px 8px;
  border-bottom: 1px solid #e8e8e8;
}
.cell-order {
  width: 20px;
  text-align: left;
  font-size: 16px;
  font-weight: 700;
  margin-right: 8px;
  color: #7a7a7a;
}
.cell-heat {
  min-width: 50px;
  text-align: right;
  font-size: 12px;
  color: #7a7a7a;
}
.cell-title {
  font-size: 13px;
  color: #495060;
  line-height: 22px;
  flex-grow: 1;
  overflow: hidden;
  text-align: left;
  text-overflow: ellipsis;
}
.top-ranking-1 .cell-order {
  color: #fadb14; /* gold (color) */
}
.top-ranking-2 .cell-order {
  color: #a9a9a9; /* silver (color) */
}
.top-ranking-3 .cell-order {
  color: #d48806; /* copper color */
}
.-effect {
  cursor: pointer;
  transition: color 0.3s ease;
}
.-effect:hover {
  color: #409eff;
}
</style>

After optimization, let's look at the final style, as follows:

In this way, we use XXL-Job to transform the hot search component is completed, the detailed code can go to see my code repository.

Extra: B Station Hot Search Crawler

1. Evaluation of the reptile program

B-site isn't a hot search, it's a popular video, but the logic is the same, it's an interface:/x/web-interface/ranking/v2

This interface returns data in JSON format, which is very simple, just look at the structure.

2. Web page parsing code

This will be able to use Postman to generate the calling code, the process I will not repeat, directly on the code, BilibiliHotSearchJob:

package ;

import ;
import ;
import ;
import ;

import ;
import ;

import ;
import ;
import ;
import ;
import ;
import ;
import ;
import .slf4j.Slf4j;
import ;
import ;
import ;
import .;
import ;
import ;

import static .CACHE_MAP;
import static ;

/**
 * @author summo
 * @version , 1.0.0
 * @description BStation Hot ListJavacrawler code
 * @date 2024surname Nian08moon19
 */
@Component
@Slf4j
public class BilibiliHotSearchJob {

    @Autowired
    private SbmyHotSearchService sbmyHotSearchService;

    @XxlJob("bilibiliHotSearchJob")
    public ReturnT<String> hotSearch(String param) throws IOException {
        ("BStation Hot Search Crawler Mission Begins");
        try {
            //consult (a document etc)BStation Hot Search Data
            OkHttpClient client = new OkHttpClient().newBuilder().build();
            Request request = new ().url("/x/web-interface/ranking/v2")
                .addHeader("User-Agent", "Mozilla/5.0 (compatible)").addHeader("Cookie", "b_nut=1712137652; "
                    + "buvid3=DBA9C433-8738-DD67-DCF5" + "-DDC780CA892052512infoc").method("GET", null).build();
            Response response = (request).execute();
            JSONObject jsonObject = (().string());
            JSONArray array = ("data").getJSONArray("list");
            List<SbmyHotSearchDO> sbmyHotSearchDOList = ();
            for (int i = 0, len = (); i < len; i++) {
                //gainBStation hot search information
                JSONObject object = (JSONObject)(i);
                //Build a hot search information list
                SbmyHotSearchDO sbmyHotSearchDO = ().hotSearchResource(())
                    .build();
                //set upBtripartiteID
                (("aid"));
                //set up文章连接
                (("short_link_v2"));
                //set up文章标题
                (("title"));
                //set up作者名称
                (("owner").getString("name"));
                //set up作者头像
                (("owner").getString("face"));
                //set up文章封面
                (("pic"));
                //set up热搜热度
                (("stat").getString("view"));
                //ordinal order
                (i + 1);
                (sbmyHotSearchDO);
            }
            if ((sbmyHotSearchDOList)) {
                return ;
            }
            //Data added to cache
            CACHE_MAP.put((), ()
                //Hot Search Data
                .hotSearchDTOList(
                    ().map(HotSearchConvert::toDTOWhenQuery).collect(()))
                //update time
                .updateTime(().getTime()).build());
            //Data persistence
            sbmyHotSearchService.saveCache2DB(sbmyHotSearchDOList);
            ("BEnd of Station Hot Search Crawler Mission");
        } catch (IOException e) {
            ("gainBStation data anomalies", e);
        }
        return ;
    }

}

Look at the result, the 4 hot searches in the first row have come out as follows: