Location>code7788 >text

Large amount of data topk-bucket + heap + multiplexed solution

Popularity:817 ℃/2025-04-02 17:40:17

Using bucketing, heap and multiplexing to solve the TopK problem: Analysis of the result processing stage
The TopK problem is a common and challenging task when dealing with large-scale data, which is to find the largest (or smallest) K elements from massive data. In order to solve this problem efficiently, we can adopt a combination of bucketing, heap and multiplexing. This article will analyze the code logic of the result processing stage in this method in detail.

Problem background
TopK problems are widely used in data processing, search engines, recommendation systems and other fields. In order to solve this problem efficiently, we adopted the strategy of bucketing, heap and multiplexing.The specific steps include: first, dividing the data into buckets and reducing the data size; then using the minimum heap in each bucket to find out the local TopK elements; finally merging the TopK elements of each bucket into the global minimum heap

Specific code

`import .*;

 public class TopKSolution {

     public static List<Integer> topK(int[] nums, int k) {
         // Step 1: Bucket
         int min = Integer.MAX_VALUE;
         int max = Integer.MIN_VALUE;
         for (int num : nums) {
             min = (min, num);
             max = (max, num);
         }
         // Number of barrels
         int bucketSize = 10;
         int bucketCount = (max - min) / bucketSize + 1;
         List<List<Integer>> buckets = new ArrayList<>();
         for (int i = 0; i < bucketCount; i++) {
             (new ArrayList<>());
         }
         // Put the element into the corresponding bucket
         for (int num : nums) {
             int bucketIndex = (num - min) / bucketSize;
             (bucketIndex).add(num);
         }

         // Step 2: Use the minimum heap in each bucket to find out TopK
         PriorityQueue<Integer> globalHeap = new PriorityQueue<>(k);
         for (List<Integer> bucket : buckets) {
             if (()) continue;
             // Create a minimum heap of k for the current bucket to find the TopK elements in the bucket
             PriorityQueue<Integer> localHeap = new PriorityQueue<>(k, ());
             for (int num : bucket) {
                 if (() < k) {
                     (num);
                 } else if (num > ()) {
                     ();
                     (num);
                 }
             }
             // Merge the TopK elements of each bucket into the global heap
             for (int num : localHeap) {
                 if (() < k) {
                     (num);
                 } else if (num > ()) {
                     ();
                     (num);
                 }
             }
         }

         // Step 3: Result processing
         List<Integer> result = new ArrayList<>(globalHeap);
         (());
         return result;
     }

     public static void main(String[] args) {
         int[] nums = {3, 2, 1, 5, 6, 4};
         int k = 2;
         List<Integer> topK = topK(nums, k);
         ("Top " + k + " elements: " + topK);
     }
 } `

Code explanation
Buckets:

  1. First find out the minimum value min and maximum value max in the array.
  2. Determine the number of buckets bucketCount, where the size of each bucket is bucketSize.
  3. Put each element in the array into the corresponding bucket according to its value.

heap:

  1. For each bucket, use the minimum heap localHeap to find out the TopK elements inside the bucket.
  2. If the size of the heap is less than K, add the element directly to the heap; if the size of the heap has reached K and the current element is larger than the top element of the heap, remove the top element of the heap and add the current element to the heap.

Multi-way merger:

  1. Merge the TopK elements of each bucket into the global minimum heap globalHeap.
  2. Finally, get the largest K elements from the global heap.

Results processing:

  1. Save the elements in the global heap into the list and sort them in descending order.