Hello everyone, I am V brother. Using EasyExcel to export large amounts of data can easily lead to memory overflow, especially when exporting millions of data. Have you ever encountered this situation, the following is V organized to solve the problem of some common methods to share with you, welcome to discuss:
EasyExcel large data volume export common methods
1. Batch writing
- EasyExcel supports batch writing of data, which allows you to load data into memory in batches and write them to Excel files in batches, avoiding loading a large amount of data into memory at one time.
- sample code (computing):
String fileName = "large_data.xlsx";
ExcelWriter excelWriter = (fileName).build();
WriteSheet writeSheet = ("Sheet1").build();
// Assuming that each write to10000data entry
int batchSize = 10000;
List<Data> dataList;
int pageIndex = 0;
do {
// Paging for data
dataList = getDataByPage(pageIndex++, batchSize);
(dataList, writeSheet);
} while (() == batchSize);
// Close resource
();
2. Set the appropriate JVM memory
- For big data export scenarios, you can try to increase the memory allocation of the JVM, for example:
java -Xms512M -Xmx4G -jar
-
account for:
-
-Xms512M
: Set the initial heap size to 512MB. -
-Xmx4G
: Set the maximum heap size to 4GB.
-
3. Reducing the complexity of data objects
- When exporting data, try to simplify the data object to avoid unnecessary nesting and loading of redundant fields to reduce the memory space occupied by the object.
4. Disable automatic column width setting
- EasyExcel's automatic column width feature can take up a lot of memory, especially with large amounts of data. Turning off automatic column widening can save memory.
- sample code (computing):
(fileName)
.registerWriteHandler(new SimpleWriteHandler()) // do not use automatic column widths
.sheet("Sheet1")
.doWrite(dataList);
5. Export using Stream (suitable for big data)
- utilization
OutputStream
Write data in batches to reduce memory consumption. Reduces memory consumption byBufferedOutputStream
Performance can be further improved. - sample code (computing):
try (OutputStream out = new BufferedOutputStream(new FileOutputStream(fileName))) {
ExcelWriter excelWriter = (out).build();
WriteSheet writeSheet = ("Sheet1").build();
int pageIndex = 0;
List<Data> dataList;
do {
dataList = getDataByPage(pageIndex++, batchSize);
(dataList, writeSheet);
} while (() == batchSize);
();
} catch (IOException e) {
();
}
6. Selection of an appropriate data export tool
- If the amount of data is very large, consider switching to an export tool that supports higher performance (such as Apache POI's
SXSSFWorkbook
), suitable for exporting million-dollar data volumes, but will be more complex to configure and use.
Here's the kicker, so how do you use POI's SXSSFWorkbook to export million-dollar data volumes?
Apache POI's SXSSFWorkbook realizes the case of exporting millions of data volume.
Using the Apache POI'sSXSSFWorkbook
Excel exports that can handle large data volumes asSXSSFWorkbook
Based on streaming write, not all the data will be loaded into memory, but use temporary files for caching, which can significantly reduce memory consumption, suitable for millions of data export. Let's look at a full implementation example below.
The code is as follows
import .*;
import ;
import ;
import ;
import ;
import ;
public class LargeDataExportExample {
public static void main(String[] args) {
// File Output Path
String filePath = "vg_large_data_export.xlsx";
// Export millions of data
exportLargeData(filePath);
}
private static void exportLargeData(String filePath) {
// Batch size per write
final int batchSize = 10000;
// Total number of data entries
final int totalRows = 1_000_000;
// establishSXSSFWorkbookboyfriend,The only thing that remains in memory is the100classifier for objects in rows such as words,The excess is written to a temporary file
SXSSFWorkbook workbook = new SXSSFWorkbook(100);
(true); // Enable temporary file compression
// establish工作表
Sheet sheet = ("Large Data");
// establish标题classifier for objects in rows such as words
Row headerRow = (0);
String[] headers = {"ID", "Name", "Age"};
for (int i = 0; i < ; i++) {
Cell cell = (i);
(headers[i]);
}
int rowNum = 1; // 数据开始的classifier for objects in rows such as words号
try {
// Write data by batch
for (int i = 0; i < totalRows / batchSize; i++) {
// Simulate the acquisition of each batch of data
List<Data> dataList = getDataBatch(rowNum, batchSize);
// Write data to theExcelcenter
for (Data data : dataList) {
Row row = (rowNum++);
(0).setCellValue(());
(1).setCellValue(());
(2).setCellValue(());
}
// After processing a batch of data,Option to clear cached data,Preventing Memory Overflow
((SXSSFSheet) sheet).flushRows(batchSize); // 清除已写的classifier for objects in rows such as words缓存
}
// Write data to file
try (FileOutputStream fos = new FileOutputStream(filePath)) {
(fos);
}
("Data export completed:" + filePath);
} catch (IOException e) {
();
} finally {
// clotureworkbookand delete the temporary files
();
}
}
/**
* Simulate paging to get data
*/
private static List<Data> getDataBatch(int startId, int batchSize) {
List<Data> dataList = new ArrayList<>(batchSize);
for (int i = 0; i < batchSize; i++) {
(new Data(startId + i, "Name" + (startId + i), 20 + (startId + i) % 50));
}
return dataList;
}
// data type
static class Data {
private final int id;
private final String name;
private final int age;
public Data(int id, String name, int age) {
= id;
= name;
= age;
}
public int getId() {
return id;
}
public String getName() {
return name;
}
public int getAge() {
return age;
}
}
}
Let's explain the code.
-
SXSSFWorkbook:
SXSSFWorkbook(100)
Indicates that up to 100 rows of data will be kept in memory, and any more than that will be written to a temporary file to save memory. -
batch file: By
batchSize
Control the amount of data written per batch to minimize memory consumption.totalRows
Setting it to 1,000,000 means exporting 1,000,000 pieces of data. -
Simulation Data Generation:
getDataBatch
method simulates paging for data, returning a batch of data at a time. -
Clearing Cache Lines: Each time a batch of data is written, it is passed through the
flushRows(batchSize)
Clears cached lines from memory to control memory usage. -
Compression of temporary files:
(true)
Enable temporary file compression to further reduce disk space usage.
Matters requiring attention
- temporary file: SXSSFWorkbook generates temporary files in the system temporary folder, you need to make sure that you have enough disk space.
-
Resource release: After completing the data write you need to call
()
to clean up temporary files. -
performance optimization: Adjustable to machine memory
batchSize
cap (a poem)SXSSFWorkbook
Cache lines to avoid frequent refreshes and memory overflows.