Location>code7788 >text

Focusing on OLAP performance improvement, Volcano Engine ByteHouse releases six scenario solutions

Popularity:300 ℃/2024-08-08 10:48:20
More technical exchanges, job search opportunities, welcome to pay attention to thebyte-jumpingdata platformWeChat PublicReply [1]Enter the official communication group.
 
Performance is critical in data analytics, as it directly determines the efficiency and timeliness of data processing, further impacting data-driven business decisions.
 
As an example, a large e-commerce company generates a huge amount of transaction data every day, and the data analytics team needs to use this data to understand customers' purchasing behavior, popular products, sales trends, etc., to develop marketing strategies and optimize operations. When the analytics team obtains a report of the past month's sales data, it takes tens of minutes to hours to produce results due to the huge volume of data and the slow processing speed of the system. And during certain promotional campaigns, if you can't get the latest sales data in time to evaluate the effect of the campaign, you can't adjust your strategy in time and miss the marketing opportunities.
 
With the deepening of digital transformation, not only in the field of e-commerce, but also in all industries need to make critical decisions based on real-time data. As for the underlying data systems of enterprises, when dealing with complex data analysis tasks, such as multi-dimensional correlation analysis or data mining, problems such as delays, crashes, and errors may occur, resulting in inaccurate, incomplete, and poorly timed data.
 
As an OLAP product launched by Volcano Engine, ByteHouse has native high-performance advantages and is widely used in data analysis scenarios of ByteDance and external enterprises. As early as February 2022, the deployment scale of ByteHouse in ByteDance has exceeded 18,000 units, with a single cluster of over 2,400 units. In response to the common situations of slow real-time throughput, slow BI reports, and slow online/offline analysis in data analysis, ByteHouse has introduced performance enhancement solutions for six scenarios to further improve OLAP performance and help enterprises implement decisions more quickly and accurately.
 
Secondly, in BI report analysis scenarios, data analysts, operation teams, etc. often encounter slow report generation, slow response of metrics platform, and slow display speed of management cockpit. On the one hand, ByteHouse achieves pre-aggregation of complex query and calculation logic to improve the performance of the application layer by enhancing the MV materialized view and Projection function, and on the other hand, it caches the data and the result set in complex query based on Query Cache. In the application practice of an entertainment company, ByteHouse supports the company's 1.5 billion data records per day, 60,000 TPS per second peak, so that the timeliness of reports is compressed from the past T+1 method directly to the minute level, or even the second level.
 
Thirdly, in online and offline complex analysis scenarios, ByteHouse enhances its self-developed optimizers for CBO, RBO, etc. In addition, ByteHouse's self-developed distributed cache can further solve the problem of performance loss caused by separation architecture. In addition, for Runtime Filter, a technology that has a key impact on the performance improvement of large correlation scenarios, ByteHouse's self-developed distributed cache can further solve the performance loss problem caused by the separation architecture.
 
Fourth, in the Lake Warehouse federated analysis scenario, ByteHouse compresses IO access paths based on Native Reader technology to improve exterior access performance. By further integrating with the optimizer, ByteHouse can continuously improve the speed of Lake Warehouse analysis.
 
Fifth, in the crowd selection and behavioral analysis scenarios, ByteHouse has self-developed engines and enhancements such as BitEngine/BitMap64/BitMap indexDe, etc. By closely integrating with applications such as Growth Analytics DataFinder, Customer Data Platform VeCDP, etc., ByteHouse developed a large number of built-in analytical functions such as retention analysis, path analysis, etc., for business scenarios. analysis functions, such as retention analysis and path analysis. Even in the scenario of 1 billion users circling, the response time of P99 can reach second to millisecond level.
 
Sixth, in the map search scenario, ByteHouse has introduced Vector engine, i.e., vector retrieval capability. By supporting multiple vector retrieval algorithms and efficient execution links, ByteHouse can support large-scale vector retrieval scenarios and achieve millisecond query latency. In the practice of an opinion monitoring enterprise, the query efficiency of industry-related products ranges from a few seconds to more than ten seconds, whereas the optimized ByteHouse can find 1000 similar images from large-scale data and complete the similarity scoring in only 150-200 milliseconds.
 
In terms of overall performance data, ByteHouse shows significant performance improvement in different query items by using SSB, TPC-H, and TPC-DS datasets and benchmarking with an open source OLAP known for its performance. Taking the TPC-H dataset as an example, ByteHouse's query efficiency is tens of times higher than that of the benchmark product under the same hardware and software environment.
 
Currently, ByteHouse is applied in the fields of Internet, game, finance, automobile, weather, etc., and continues to provide enterprises with the ultimate data analysis capability and boost the transformation and upgrading of digital intelligence.
 
click to jumpVolcano Engine Cloud Native Data Warehouse ByteHouse Learn more.