Spend 100 bucks to make a small touchdown site! Part 8 - Adding a Word Cloud Component and a Search Component

⭐️ Basic Link Navigation ⭐️

Server →☁️ AliCloud event address

See sample →🐟 Groping small website address

Learning Code →💻 Source code repository address

I. Preface

Hello everyone, I'm summo, recently the small site crashed for a few days, the reason one is the expiration of the SSL certificate, the second is the expiration of the free RDS, and I am studying while looking for a job, so I did not care to repair, sorry ha (PS: eight stocks is so difficult to memorize, algorithms are so difficult to brush).

There are quite a lot of content and components in the small site, so today we will continue to enrich its functionality and make it look more beautiful and useful. Today we'll add a word cloud component and a search component, and we'll also arrange the content of the site in a way that's not too difficult, but more interesting. Let's start with the word cloud component.

II. Word cloud components

There are the same and different hot searches for different organizations. The word cloud component works by splitting and counting the hot search titles and counting the highest frequency hot searches, making it easy to quickly understand what the hottest hot search content is.

1. Stutterer

jieba is a word splitter , you can realize intelligent word splitting , the earliest is to provide a python package , and later by petal (huaban) developed a java version .
Source Code Connection:/huaban/jieba-analysis

(1) maven dependencies

<!-- jiebaparticiple -->
<dependency>
  <groupId></groupId>
  <artifactId>jieba-analysis</artifactId>
  <version>1.0.2</version>
</dependency>

(2) Write a demo to try out the participle.

The demo is as follows:

package ;

import ;
import ;

import ;
import ;
import ;import ;import ;import ;import
import ;

public class WordCloudTest {

    public static void main(String[] args) {
        List<String> titleList = (
                "Spend 100 bucks to make a touchy-feely little website! " Part 7 - Who Visited Our Site?" ,
                "Spend 100 bucks to make a small website that touches the fish! Part 6 - Deploying a Small Website to a Cloud Server" ,
                "Spend 100 bucks to make a small website that touches the fish! 
                "Spend 100 bucks to make a small website to touch the fish! " Title IV-Front-end application build and completion of the first hot search component","《florid100Block made a small fishing site!
                "Spend 100 bucks to make a small website to touch the fish! Part III - Hot Search Table Structure Design and Hot Search Data Storage", ""Spend 100 Blocks to Make a Small Website for Groping Fish!
                "Spend 100 bucks to make a small website to touch the fish! ", "The second - back-end application construction and complete the first crawler",
                "Spend 100 bucks to make a small groping website! Part 1 - Buying a Cloud Server and Initializing the Environment", "Making a Small Website for 100 Bucks!
                "Spend 100 bucks to make a small groping website! - Sequence" inspiration").
        JiebaSegmenter segmenter = new JiebaSegmenter();
        Map<String, Integer> wordCount = new HashMap<>();
        Iterator<String> var4 = ();

        while (()) {
            
            List<String> words = (());
            Iterator<String> var7 = ();

            while (()) {
                String word = ();
                (word, (word, 0) + 1);
            }
        }
        ((word, count) -> {
            ("word->" + word + ";count->" + count);
        });
    }

}

The results of the run are as follows:

The results show that the sentences have been divided into words and counted, but there are also many meaningless words, such as "the", "and", "had", and so on. Such words are calledstop-word, generally such words have to be filtered out. We can go online and search for common deactivated words and then exclude it while setting the weights. The library of deactivated words I use has been submitted to the code base, so you can take it directly.

(3) Hot Search Title Segmentation Interface

package ;

import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import .;
import ;
import ;
import ;

import .*;
import ;

@RestController
@RequestMapping("/api/hotSearch/wordCloud")
public class WordCloudController {

    private static Set<String> STOP_WORDS;
    private static JSONArray WEIGHT_WORDS_ARRAY;

    @RequestMapping("/queryWordCloud")
    public ResultModel<List<WordCloudDTO>> queryWordCloud(@RequestParam(required = true) Integer topN) {
        List<HotSearchDTO> hotSearchDTOS = gatherHotSearchData();
        List<String> titleList = ().map(HotSearchDTO::getHotSearchTitle).collect(());
        return (findTopFrequentNouns(titleList, topN));
    }

    /**
     * Get the stop words
     *
     * @return
     */
    private List<HotSearchDTO> gatherHotSearchData() {
        String stopWordsStr = ("WordCloud", "StopWords");
        STOP_WORDS = ((","));
        WEIGHT_WORDS_ARRAY = (("WordCloud", "WeightWords"));
        List<HotSearchDTO> hotSearchDTOS = new ArrayList<>();
        HotSearchCacheManager.CACHE_MAP.forEach((key, detail) -> {
            (());
        });
        return hotSearchDTOS;
    }

    /**
     * participle
     *
     * @param titleList Title List
     * @param topN Intercepts the size of the hotword for the specified length
     * @return
     */
    public static List findTopFrequentNouns(List<String> titleList, int topN) {
        JiebaSegmenter segmenter = new JiebaSegmenter();
        Map<String, Integer> wordCount = new HashMap<>();
        Iterator<String> var4 = ();

        while (()) {
            String title = ();
            List<String> words = (());
            Iterator<String> var7 = ();

            while (()) {
                String word = ();
                (word, (word, 0) + 1);
            }
        }

        return ().stream()
                //disabling word filter
                .filter(entry -> !STOP_WORDS.contains(()))
                //construct an object
                .map(entry -> ().word(()).rate(()).build())
                //weighting
                .map(wordCloudDTO -> {
                    if ((WEIGHT_WORDS_ARRAY)) {
                        return wordCloudDTO;
                    } else {
                        WEIGHT_WORDS_ARRAY.forEach(weightedWord -> {
                            JSONObject tempObject = (JSONObject) weightedWord;
                            if (().equals(("originWord"))) {
                                (("targetWord"));
                                if (("weight")) {
                                    (("weight"));
                                }
                            }
                        });
                        return wordCloudDTO;
                    }
                })
                //Sort by frequency of occurrence
                .sorted((WordCloudDTO::getRate).reversed())
                //pre-interceptiontopNdata
                .limit(topN)
                .collect(());
    }

}

Here I added a weight replacement logic, because I found that the splitter for some of the hot words parsing problems. For example, the hot search "Black Myth-Wukong" was very popular some time ago, but "Black Myth" is not a word in Chinese, so stuttering can only recognize "Myth" when splitting words. so Stutterer could only recognize the word "myth" in the word separation. To solve this problem, I added a manual replacement logic.

2. Front-end components

(1) vue-wordcloud component

The official documentation for the component is linked below:/package/vue-wordcloud

The npm introduction directive is as follows:cnpm install vue-wordcloud

(2) Component Code

<template>
  <el-card class="word-cloud-card">
    <wordcloud
      class="word-cloud"
      :data="words"
      nameKey="name"
      valueKey="value"
      :wordPadding="2"
      :fontSize="[10,50]"
      :showTooltip="true"
      :wordClick="wordClickHandler"
    />
  </el-card>
</template>

<script>
import wordcloud from "vue-wordcloud";
import apiService from "@/config/";

export default {
  name: "app",
  components: {
    wordcloud,
  },
  methods: {
    wordClickHandler(name, value, vm) {
      ("wordClickHandler", name, value, vm);
    },
  },
  data() {
    return {
      words: [],
    };
  },
  created() {
    apiService
      .get("/hotSearch/wordCloud/queryWordCloud?topN=100")
      .then((res) => {
         = ((item) => ({
          value: ,
          name: ,
        }));
      })
      .catch((error) => {
        // Handling of error conditions
        (error);
      });
  },
};
</script>
<style scoped>
.word-cloud-card {
  padding: 0% !important;
  max-height: 300px;
  margin-top: 10px;
}
.word-cloud {
  max-height: 300px;
}
>>> .el-card__body {
  padding: 0;
}
</style>

The component is easy to use, and the result is not bad, but it caused a small bug, after using this component will lead to a white space at the bottom of the small site, now do not know how to solve.

III. Re-layout and search components

1. Reconfiguration

As the small site has more and more components, the overall layout needs to be redesigned, and the current approximate layout is as follows:

The layout uses the same layout component that comes with ElementUI: the

<el-container>
  <el-header> ... </el-header>
  <el-main> ... </el-main>
  <el-footer> ... </<el-footer>
</el-container>

2. Search component

The search component uses the<el-autocomplete>The only thing we need to be aware of is that the results can be duplicated, so we need to add a source identifier to the results. Components are not difficult, the only thing to note is that the content of the results of the search is likely to be repeated, so we need to add a source identification of the results.
Here you need to assemble a custom component using a slot that looks like this:

The component code is as follows:

<template slot-scope="{ item }">
  <div style="display: flex; justify-content: space-between">
    <span style="max-width: 280px;overflow: hidden;text-overflow: ellipsis;white-space: nowrap;">
      {{  }}
    </span>
    <span style="max-width: 80px; color: #8492a6; font-size: 13px; white-space: nowrap; " >
      <img :src="getResourceInfo().icon" style="width: 16px; height: 16px; vertical-align: middle"/>
        {{ getResourceInfo().title }}
    </span>
  </div>
</template>

You can go to my source code for the exact logic, I won't post the whole code here.

IV. To summarize

These widgets weren't something I thought about doing from the start, most of them were just something I did out of the blue on a whim. Some of them may not seem all that useful, but it's been great to watch the content of the little site grow. I've committed all the source code to Gitee in the meantime, but I haven't had a chance to review it yet, so in addition to sharing how to make the widgets, I'll also share with you some of the bugs and problems I've encountered in the past 4 months, and why my code is written the way it is.

Extra: Hot Headline Crawler

1. Evaluation of the reptile program

The headline's hot search interface returns a string of JSON-formatted data, which is simple enough to save us the trouble of parsing the dom, and the access link is: [/hot-event/hot-board/?origin=toutiao_pc)

2. Web page parsing code

package ;

import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import ;
import .slf4j.Slf4j;
import ;
import ;
import ;
import .;
import ;
import ;
import ;
import ;
import ;

import ;
import ;
import .*;
import ;
import ;

import static .CACHE_MAP;
import static ;

/**
 * @author summo
 * @version , 1.0.0
 * @description headline searchJavacrawler code
 * @date 2024surname Nian08moon09
 */
@Component
@Slf4j
public class ToutiaoHotSearchJob {

    @Autowired
    private SbmyHotSearchService sbmyHotSearchService;

    @XxlJob("toutiaoHotSearchJob")
    public ReturnT<String> hotSearch(String param) throws IOException {
        (" headline search爬虫任务开始");
        try {
            //查询今日headline search数据
            OkHttpClient client = new OkHttpClient().newBuilder().build();
            Request request = new ().url(
                    "/hot-event/hot-board/?origin=toutiao_pc").method("GET", null).build();
            Response response = (request).execute();
            JSONObject jsonObject = (().string());
            JSONArray array = ("data");
            List<SbmyHotSearchDO> sbmyHotSearchDOList = ();
            for (int i = 0, len = (); i < len; i++) {
                //Get the information about the hot searches of Zhihu
                JSONObject object = (JSONObject)(i);
                //Build a hot search information list
                SbmyHotSearchDO sbmyHotSearchDO = ().hotSearchResource(
                        ()).build();
                //Setting up the Knowledge TripartiteID
                (("ClusterIdStr"));
                //Setting up article links
                (("Url"));
                //Setting the article title
                (("Title"));
                //Setting the heat of a hot search
                (("HotValue"));
                //ordinal order
                (i + 1);
                (sbmyHotSearchDO);
            }
            if ((sbmyHotSearchDOList)) {
                return ;
            }
            //Data added to cache
            CACHE_MAP.put((), ()
                    //Hot Search Data
                    .hotSearchDTOList(().map(HotSearchConvert::toDTOWhenQuery).collect(()))
                    //update time
                    .updateTime(().getTime()).build());
            //Data persistence
            sbmyHotSearchService.saveCache2DB(sbmyHotSearchDOList);
            (" headline search爬虫任务结束");
        } catch (IOException e) {
            ("Get Headline Data Exception", e);
        }
        return ;
    }

    @PostConstruct
    public void init() {
        // Start running the crawler once
        try {
            hotSearch(null);
        } catch (IOException e) {
            ("Failed to start crawler script",e);
        }
    }
}