A Java performance tuning practice [code + JVM performance increase of 70%].

This is the first time I've tuned a system that involves tuning at the code and JVM level. If you can see the end of the words, maybe it will help your daily development, you can avoid like me, make some low-level mistakes. The code of this tuning is the report analysis function in the buried system, small company, after the end of development, there is no Code Review session, so some of the following problems, maybe in the Code Review session can be avoided.

The tools used for this tuning are JProfiler, Arthas, and Visual VM.

Arthas is used to view multiple method elapsed time, JProfiler and Visual VM are used to observe heap memory changes, GC activity graphs, real-time object counts and sizes, etc., during JVM operation.

Before and after tuning, JDK8, with a heap memory size of 500M, using theSerialGCThe response time of the system is improved by 70% for the same amount of data.

Function

Before I start, I'll give a brief overview of the system for the sake of a better reading of this article.

This system is a buried point system, after the client reports the buried point data, you can report and analyze the reported buried point data in the background. For example, statistics of retention, active number, gender ratio of game players and so on.

The logic of the report analysis function of the system is:

Data Fetch: pull buried data from the database based on the time of analysis.
Global Filtering: Performs global filtering on buried data pulled from the database. Global filtering can be performed on each parameter in the buried data by multipleboth (... and...)maybemaybeOperation. It is the globally filtered buried data that will be used for subsequent dimension and metric calculations.

Dimension Calculation: Dimension (similar to X-axis value in XY-axis) is calculated by grouping the queried buried data according to a grouping field, and the value of that grouping field is then used as the dimension value. The dimension value type can be numeric, date, string. Depending on the type of dimension value, descending and ascending operations can be performed, and the dimension can be categorized into one and two dimensions.
Indicator calculation: By grouping the buried data according to the grouping field, a group of buried data is obtained, and each group of buried data consists of multiple pieces of buried data. Indicator is through all the buried data in each group to perform numerical operations, and you can add filters (for example, only calculate the gender of "female" users). A report can contain multiple metrics.

Data assembly, front-end presentation.

The above is the main flow of the buried report function. The reason for the response timeout of this function is due to the dimension sorting, buried data filtering, and metrics calculation.

Colleagues in the background of the report analysis, found in the analysis of the data volume in the case of more than 30,000, the interface timeout (30s), I initially thought that it was due to pulling data from the database, but I found through the service logs, in the process of the interface execution, the execution of the SQL time-consuming is not high, and that the SQL went to the indexes, it can be directly ruled out the database side of the reason.

Dimensional optimization

Locating the problem code

Further analysis of the logs revealed that it was found to be very time consuming to process the dimensional data, as the dimensions are calculated in a very simple way but the time consumed is very large, which is very problematic.

From the above log, we can see that it took 331 ms to analyze 197,554 pieces of data in one dimension, but it took 161 ms to analyze 4,012 pieces of data in dimension using the same code. 1D and 2D are executing the same code, but the difference in time consumed is so big.

The code to perform dimensional analysis is:

public List<DimensionData> query(DimensionSetting dimensionSetting, TrackReportSetting trackReportSetting, List<TrackData> trackDataList) {
    List<DimensionData> dimensionDataList = null;
    try {
		    // doQueryThe method is to group the buried data based on the grouping field
        dimensionDataList = doQuery(dimensionSetting, trackReportSetting, trackDataList);
        dimensionDataList = applyDimensionIntervalRange(dimensionDataList, dimensionSetting);
        // Sorting the grouped dimension values
        sortDimension(dimensionDataList, dimensionSetting);
    } finally {
        completeQuery(dimensionDataList);
    }
    return dimensionDataList;
}

Because of the simplicity of this part of the code, I wasn't sure what was causing it at first, but I found out later, through Arthas, that the high time consumption was due to the execution of thesortDimension(dimensionDataList, dimensionSetting)The methodology leads to this.

private void sortDimension(List<DimensionData> dimensionDataList, DimensionSetting dimensionSetting) {
    (new DimensionData());
    AtomicLong atomicSortValue = new AtomicLong(0L);
    (t -> {
        ((1L));
    });
}

public class DimensionData implements Comparator<DimensionData> {
    private DimensionSetting dimensionSetting;
    private String dimensionLabel;
    // Omit other fields

    @Override
    public int compare(DimensionData data1, DimensionData data2) {
        DimensionSetting dimensionSetting = ();
        (dimensionSetting != null, () -> new XcyeDataException("Buried dimension settings cannot be empty!"));
         sortType = ();
        if ( == sortType) {
            Comparator naturalOrder = ();
            return ((), ());
        }
        String comparableLabel1;
        String comparableLabel2;
        if ( == sortType) {
            // descending order
            comparableLabel1 = ();
            comparableLabel2 = ();
        } else {
            // ascending order
            comparableLabel1 = ();
            comparableLabel2 = ();
        }

        Object comparableObj;
        Comparable comparable;
        if ((comparableLabel1) && (comparableLabel2)) {
            comparable = (comparableLabel1);
            comparableObj = (comparableLabel2);
        } else if ((((comparableLabel1, "yyyy-MM-dd") || (comparableLabel1, "yyyy-MM-dd HH:mm:ss"))
                && ((comparableLabel2, "yyyy-MM-dd") || (comparableLabel2, "yyyy-MM-dd HH:mm:ss")))) {
            comparable = new Date((comparableLabel1).getTime());
            comparableObj = new Date((comparableLabel2).getTime());
        } else {
            // (sth. or sb) else Always treat as a string
            comparable = comparableLabel1;
            comparableObj = comparableLabel2;
        }
        return (comparableObj);
    }
}

The main culprit iscompare(DimensionData data1, DimensionData data2)method, because when it was first designed, it didn't provide the option to set the type of dimension, but in general, there will only be three types of dimensions in general, namely numeric strings, characters, and date strings.

The above compare method first tries to parse the string numerically, and if it fails, then it parses it as a time string, and if it still fails, then it treats it as a normal string.

But when trying to do numeric, date parsing, inside the method execution, additional objects are created, and how many objects are created has to do with the number of dimension values, and the above code already has tons of problems.

As you can also see by adding the JVM parameter, it does trigger multiple GCs during the execution of the sorting above, and as you can also see from the figure below, the GC is very frequent during the execution of the compare method, with an average of 5ms of downtime for the application thread per GC.

Optimized sorting

Now that you've got the cause of the long running time of the compare method, you just need to treat the symptoms. I am using the cache + provide dimension parameter type approach to optimize. The optimized code is:

public class DimensionData implements Comparator<DimensionData> {
    private DimensionSetting dimensionSetting;
    private String dimensionLabel;
    private ConcurrentHashMap<String, Object> localComparatorCache;

    @Override
    public int compare(DimensionData data1, DimensionData data2) {
        DimensionSetting dimensionSetting = ();
        (dimensionSetting != null, () -> new XcyeDataException("Buried dimension settings cannot be empty!"));
         sortType = ();
        if ( == sortType) {
            Comparator naturalOrder = ();
            return ((), ());
        }

        String comparatorLabel;
        String comparedLabel;
        if ( == sortType) {
            // descending order
            comparatorLabel = ();
            comparedLabel = ();
        } else {
            // ascending order
            comparatorLabel = ();
            comparedLabel = ();
        }

        Object comparedValue = parseComparedValue(comparedLabel, dimensionSetting, data1);
        Comparable<Object> comparable = (Comparable<Object>) parseComparedValue(comparatorLabel, dimensionSetting, data2);
        return (comparedValue);
    }

    private Object parseComparedValue(String comparatorLabel, DimensionSetting dimensionSetting, DimensionData dimensionData) {
        ConcurrentHashMap<String, Object> localComparatorCacheTemp = ();
        if (localComparatorCacheTemp == null) {
            throw new XcyeOtherException("localComparatorCacheCannot be empty!");
        }

         fieldValueType = ();
        if (fieldValueType == null || == fieldValueType) {
            return comparatorLabel;
        }

        Object value = (comparatorLabel);
        if (value == null) {
            try {
                if ( == ()) {
                    value = (comparatorLabel);
                    (comparatorLabel, value);
                } else if ( == ()) {
                    value = (comparatorLabel);
                    (comparatorLabel, value);
                } else {
                    value = comparatorLabel;
                }
            } catch (Exception e) {
                ("Error when sorting dimension labels for conversion: {}", (), e);
                return comparatorLabel;
            }
        }
        return value;
    }
}

localComparatorCacheis a local cache that is removed once the dimension computation is complete. By passing the type of this dimension value on the frontend when parsing it, we can avoid having to execute theTrying to parse dimension value typesand there is no need to create additional objects.

The results of the run after optimization are:

JVM optimization before and after comparison:

	pre-optimization	post-optimization
Number of GCs	Each execution of compare triggers an average of 3 times	Trigger once only after all executions

Optimization of indicators

Optimization 1

The metrics section is more complex compared to the dimensions section because it will involve data filtering, quadratic operations, formula parsing, etc. This section is run using a multi-threaded approach.

Let's take a look at the pre-optimization runlog first (the time taken is equally sensational):

Entrance Method:

public List<DimensionMetricData> doQuery(MetricSetting metricSetting, List<DimensionData> dimensionDataList, int leftOffset, int rightOffset) {
    // Query event parameter set
    for (int i = leftOffset; i <= rightOffset; i++) {
        DimensionData dimensionData = (i);
        TrackDataCalculatorDTO trackDataCalculator = new TrackDataCalculatorDTO();
        (());

				// There are performance issues
        MetricCalculator metricCalculator = new TrackFormulaMetricCalculator(metricSetting, dimensionData);
        Double metricValue = (Double) (trackDataCalculator);

				// There are performance issues
        String formatValue = (metricValue, ());
	      // The rest of the code
    }
}

//
public class MathUtils {
  /**
   * Formatting the decimal point
   */
  public static String doubleFormat(double num1, int maximumFractionDigits) {
      return numberFormat(getFormatPattern(maximumFractionDigits), num1);
  }

  private static String getFormatPattern(int maximumFractionDigits) {
      StringBuilder format = new StringBuilder("#.");
      for (int i = 0; i < maximumFractionDigits; i++) {
          ("#");
      }
      return ();
  }

  private static String numberFormat(String pattern, double num) {
      NumberFormat instance = ();
      if (!(instance instanceof DecimalFormat)) {
          throw new XcyeOtherException("Unexpected NumberFormat implementation");
      }
      DecimalFormat decimalFormat = (DecimalFormat) instance;
      ();
      (pattern);
      return (num);
  }
}

The above code is not the cause of the high time consuming metrics calculation, I'm posting it here for the points that can be optimized without affecting the normal operation of the system.

As you can see from the code above, creating both in the for loop creates theTrackFormulaMetricCalculatorrespond in singingTrackDataCalculatorDTOObject.TrackDataCalculatorDTOobject is created to encapsulate some of the parameters needed for the calculation, creating theTrackFormulaMetricCalculatoris to instantiate a formula calculator and pass some parameters through the constructor method.

Initially, the above code seems fine, but then you add the for loop and thedoQueryIn the case of being called multiple times, then the above code, again, will have the problem of creating a large number of objects during the run, and if the heap size allocation is not high, then it will inevitably trigger GC.

TrackFormulaMetricCalculator#calculate()The execution of the methods depends on several objects, but there are no other classes that need to get any attribute variables from the TrackFormulaMetricCalculator class, which means that the TrackFormulaMetricCalculator class can be used entirely as a utility class or a static class, or that the class's corresponding The object is thread-safe, and methods within the object can be called from anywhere (provided that the methods do not depend on properties within the object).

Based on the analysis above and the scenarios, I willTrackFormulaMetricCalculatorThe attribute fields in the class are removed, and these fields are removed by sending a request to thecalculateThe method passes the parameter in such a way that the TrackFormulaMetricCalculator class creates a constant object.

For the TrackDataCalculatorDTO class, the role of the class is to store the non-return-value data generated during the execution of the method, but because the local variables within the doQuery method are thread-safe, it is possible to control the TrackDataCalculatorDTO class to be new only once in the doQuery method.

toolsMathUtilsThe problems are.getFormatPattern(int maximumFractionDigits)method constructs a new character each time it is called, however, there will generally only be 1 to 4 decimal points, so we can change the getFormatPattern method to first get the pattern character for the specified decimal length from the cache, and then create it if it is not available.

The numberFormat method can also be changed to go to the cache, first based on the pattern from the cache to get the DecimalFormat, if not, and then execute the get DecimalFormat, which can reduce the execution of theThe implementation of the

The optimized code is:

private static final MetricCalculator METRIC_FORMULA_CALCULATOR_INSTANCE = new TrackFormulaMetricCalculator();
  @Override
  public List<DimensionMetricData> doQuery(MetricSetting metricSetting, List<DimensionData> dimensionDataList, int reportId,
                                           TrackEvent trackEvent, List<TrackParameter> trackParameterList) {
      TrackDataCalculatorDTO trackDataCalculator = new TrackDataCalculatorDTO();
      (reportId);

      // Query event parameter set
      for (DimensionData dimensionData : dimensionDataList) {
          (());
          (dimensionData);
          (metricSetting);
          (trackParameterList);

          Double metricValue = (Double) METRIC_FORMULA_CALCULATOR_INSTANCE.calculate(trackDataCalculator);
          String formatValue = (metricValue, ());
          // Other Codes
          
      }
  }

// 
public class MathUtils {

    private static final ConcurrentHashMap<Integer, String> FORMAT_PATTERN_CACHE = new ConcurrentHashMap<>();

    private static final ConcurrentHashMap<String, DecimalFormat> DECIMAL_FORMAT_CACHE = new ConcurrentHashMap<>();

    public static String doubleFormat(double num1, int maximumFractionDigits) {
        return numberFormat(getFormatPattern(maximumFractionDigits), num1);
    }

    public static String doubleToPercent(double num1, int maximumFractionDigits) {
        String formatPattern = getFormatPattern(maximumFractionDigits);
        if (!(formatPattern)) {
            formatPattern = ".#";
        }
        return numberFormat(formatPattern + "%", num1);
    }

    private static String getFormatPattern(int maximumFractionDigits) {
        String formatCache = FORMAT_PATTERN_CACHE.get(maximumFractionDigits);
        if ((formatCache)) {
            return formatCache;
        }
        StringBuilder format = new StringBuilder("#.");
        for (int i = 0; i < maximumFractionDigits; i++) {
            ("#");
        }
        formatCache = ();
        FORMAT_PATTERN_CACHE.put(maximumFractionDigits, formatCache);
        return formatCache;
    }

    private static String numberFormat(String pattern, double num) {
        DecimalFormat decimalFormat = DECIMAL_FORMAT_CACHE.get(pattern);
        if (decimalFormat == null) {
            NumberFormat numberFormat = ();
            if (!(numberFormat instanceof DecimalFormat)) {
                throw new XcyeOtherException("Unexpected NumberFormat implementation");
            }
            decimalFormat = (DecimalFormat) numberFormat;
            ();
            (pattern);
            DECIMAL_FORMAT_CACHE.put(pattern, decimalFormat);
        }
        return (num);
    }
}

Optimization 2

The core method for calculating indicator data is calculate, which is coded as:

public class TrackFormulaMetricCalculator {
public Double calculate(TrackDataCalculatorDTO trackDataCalculator) {

        // Formula for calculating indicators
        String formula = ();
        List<String> formulaElementList = ();

        // Combined according to the formulaAviatorparameters
        Map<String, Object> env = new HashMap<>();
        Map<String, String> adapterAviatorFormulaElementMap = adapterAviatorFormulaElementMap(formulaElementList);
        for (String formulaElement : formulaElementList) {
            // Determine if it is a built-in formula element
            if (INTERNAL_FORMULA_ELEMENT_SET.contains(formulaElement)) {
                continue;
            }

            // Analyzing formula expressions
            FormulaExpression formulaExpression = resolveFormulaExpression(formulaElement);

            // count
            Double computeValue = computeFormulaResult(formulaExpression, dimensionData, trackDataCalculator);
            ((formulaElement), computeValue);
        }

        ((s1, s2) -> {
            boolean s1Status = ("{");
            boolean s2Status = ("{");
            if (s1Status && s2Status) {
                return () - ();
            }
            if (s1Status) {
                return -1;
            }
            if (s2Status) {
                return 1;
            }
            return () - ();
        });
        // formula replacement
        for (String formulaElement : formulaElementList) {
            if (INTERNAL_FORMULA_ELEMENT_SET.contains(formulaElement)) {
                continue;
            }
            formula = (formulaElement, (formulaElement));
        }

        Expression expression = null;
        try {

            expression = (formula, false);
        } catch (Exception e) {
            ((), e);
            throw new XcyeOtherException("Compilation formula: " + formula + "fail (e.g. experiments)!");
        }
        Object executeResult = (env);
        // Convert values toDoubletypology
        return parseDouble(executeResult);
    }
}

The above code executes more methods, we can not see the problem at a glance, here you can use Alibaba's Arthas tool to see the time consumption of each method in the class.

After starting Arthas, the methods calculate, computeFormulaResult, filterTrackDataByConditionalExpress are traced via the trace command.

`---ts=2024-07-06 18:23:38;thread_name=xxx-track-event-metric-compute-7;id=508;is_daemon=true;priority=5;TCCL=@50a095cb
    `---[0.7025ms] :calculate()
        `---[99.19% 0.6968ms ] :calculate() #39
            `---[99.17% 0.691ms ] :calculate()
                +---[0.41% 0.0028ms ] :getFormula() #100
                +---[0.30% 0.0021ms ] :getFormulaElementList() #101
                +---[0.96% 0.0066ms ] :adapterAviatorFormulaElementMap() #105
                +---[8.41% min=0.0117ms,max=0.0318ms,total=0.0581ms,count=3] :resolveFormulaExpression() #113
                +---[29.87% min=0.0409ms,max=0.1144ms,total=0.2064ms,count=3] :computeFormulaResult() #116
                |   `---[90.99% min=0.0349ms,max=0.1079ms,total=0.1878ms,count=3] :computeFormulaResult()
                |       +---[2.77% min=0.0017ms,max=0.0018ms,total=0.0052ms,count=3] :getTrackDataList() #282
                |       +---[58.15% min=0.0078ms,max=0.0935ms,total=0.1092ms,count=3] :filterTrackDataByConditionalExpress() #281
                |       |   `---[84.34% min=0.0024ms,max=0.0873ms,total=0.0921ms,count=3] :filterTrackDataByConditionalExpress()
                |       |       +---[2.06% 0.0019ms ] :getConditionList() #315
                |       |       +---[78.39% 0.0722ms ] :filter() #322
                |       |       `---[2.71% 0.0025ms ] :getType() #324
                |       +---[3.19% min=0.0019ms,max=0.0022ms,total=0.006ms,count=3] :setFinallyComputeTrackDataList() #283
                |       +---[3.57% min=0.0032ms,max=0.0035ms,total=0.0067ms,count=2] :createFormulaCalculateFactory() #291
                |       `---[10.06% min=0.0047ms,max=0.0142ms,total=0.0189ms,count=2] :calculate() #292
                +---[49.58% 0.3426ms ] :compile() #145
                +---[2.13% 0.0147ms ] :execute() #150
                `---[0.82% 0.0057ms ] :parseDouble() #152

How to find a breakthrough with Arthas?

The results of the trace can be analyzed, focused on, and calling those methods that are time-consuming.

As you can see above, too much time is spent on the computeFormulaResult and AviatorEvaluatorInstance:compile() methods when executing the calculate method.

Let's first analyze the computeFormulaResult method, which spends 58% of its execution on the filterTrackDataByConditionalExpress() method, which ultimately calls TrackDataFilterUtil#filter().

utilizationtrace -E TrackDataFilterUtil ‘filter|getFilterTrackDataExpress’ -n 20This method and other methods within it are tracked and the tracking results are as follows:

---ts=2024-07-06 18:42:48;thread_name=xxx-track-event-metric-compute-4;id=501;is_daemon=true;priority=5;TCCL=@50a095cb
    `---[0.0843ms] :filter()
        +---[17.08% 0.0144ms ] :getFilterTrackDataExpress() #66
        |   `---[78.47% 0.0113ms ] :getFilterTrackDataExpress()
        |       +---[29.20% 0.0033ms ] :parseMultiConditionExpressMap() #198
        |       +---[10.62% 0.0012ms ] org.:isDebugEnabled() #200
        |       `---[14.16% 0.0016ms ] :compile() #206
        `---[73.55% 0.062ms ] :createMultiKeyMap()
            +---[12.26% 0.0076ms ] :convertObjToMap() #122
            +---[11.13% 0.0069ms ] :createParamValueMap() #123
            +---[1.61% 0.001ms ] :getOtherData() #125
            +---[5.00% 0.0031ms ] :parseJsonToObj() #126
            +---[2.10% 0.0013ms ] :getRequestAddressInfo() #128
            +---[2.10% 0.0013ms ] :getOperationSystem() #129
            +---[1.77% 0.0011ms ] $RequestAddressInfo:getLocationAddress() #135
            +---[1.61% 0.001ms ] $OperationSystem:getName() #140
            +---[1.61% 0.001ms ] $OperationSystem:getVersion() #141
            +---[1.77% 0.0011ms ] $RequestAddressInfo:getIp() #144
            +---[1.77% 0.0011ms ] $RequestAddressInfo:getLocationAddress() #146
            +---[1.61% 0.001ms ] $LocationAddress:getCity() #146
            +---[1.77% 0.0011ms ] $RequestAddressInfo:getLocationAddress() #148
            +---[1.45% 9.0E-4ms ] $LocationAddress:getIsp() #148
            +---[1.94% 0.0012ms ] $RequestAddressInfo:getLocationAddress() #150
            +---[1.61% 0.001ms ] $LocationAddress:getCountry() #150
            +---[1.77% 0.0011ms ] $RequestAddressInfo:getLocationAddress() #152
            +---[1.61% 0.001ms ] $LocationAddress:getRegion() #152
            +---[1.77% 0.0011ms ] $RequestAddressInfo:getLocationAddress() #154
            `---[1.61% 0.001ms ] $LocationAddress:getProvince() #154

From the above, we can get that the reason for the slow execution of the TrackDataFilterUtil#filter method is mainly due to the createMultiKeyMap method and the getFilterTrackDataExpress method, so we can try to optimize these two methods to reduce the time spent on executing these two methods. so we can try to optimize these two methods to reduce the time it takes to execute them.

Don't look at the above filter method consumes a small amount of time (I feel that the time is not quite right, time consumed more than this), but the filter method will be called many times, the accumulation of a small number of times, it will also have an impact on the performance of the system.

parseMultiConditionExpressMap and AviatorEvaluatorInstance:compile are the main factors affecting getFilterTrackDataExpress, while AviatorEvaluatorInstance:compile method because Avator has been optimized internally using LRU, and the main logic of the parseMultiConditionExpressMap method is to parse the parameters and splice the final expression from them, and if you want to optimize it, you have to use caching, and I didn't end up optimizing the method because of a number of factors.

Now all that's left is the createMultiKeyMap method, which has the code:

// pre-optimization
private static Map<String, Object> createMultiKeyMap(TrackData trackData,
                                                     Map<String, TrackParameter> eventParameterMap) {
    Map<String, Object> resultMap = (trackData);
    createParamValueMap(trackData, eventParameterMap, resultMap);
    
    String otherDataJson = ();
    TrackOtherDataAdminDTO otherData = (otherDataJson, );
    // The rest of the code
}

// post-optimization
private static Map<String, Object> createMultiKeyMap(TrackDataDTO trackData,
                                                     Map<String, TrackParameter> eventParameterMap) {
    Map<String, Object> resultMap = (trackData);
    createParamValueMap(trackData, eventParameterMap, resultMap);

    TrackOtherDataAdminDTO otherData = ();
    // The rest of the code
}

The idea of the above optimization is to find out through tracing that it is time consuming to deserialize the json to TrackOtherDataAdminDTO, and then because the TrackData object, after being loaded from the database to the memory, will not modify the data in the TrackData in all the subsequent processes because the values of the otherData and value fields of the class are of json type and these two json deserialized data will be frequently used in the calculation process but will not be modified. and value fields in this class are of json type and the two jsons are deserialized and used frequently in the calculation process but not modified.

The previous logic was to perform deserialization operations when using otherData and value data, so there would be a duplicate deserialization process.

After optimization, after the data is loaded into memory from the database, the otherData and value fields in TrackData are deserialized on demand, and the deserialized values are stored in TrackDataDTO, a subclass of TrackData, which reduces the repetitive deserialization process.

With the above optimization, the changes before and after optimization are as follows:

comprehensive database

Limit the maximum number of entries for a single query

Report analysis of this function, I am a one-time will be all the data from the database loaded into memory for calculation. Tests found that loading 170,000 pieces of data to take up more than 250M heap memory, plus subsequent calculations in the process of generating new objects and the use of other interfaces, we need to reasonably allocate the size of the heap memory. It is necessary to limit the maximum number of entries in a single query.

If you are worried about the mismatch between the set maximum number of query entries and the current heap memory size, you can try the dynamic approach. By using the formulaMaximum number of entries = (Xmx - Reserved Heap Memory Size) * 1024 * 1024 / Size of entity class objects for a single SQL mapDynamically get the maximum number of entries for a single query, so that you don't have to worry about cases where the heap memory is large but the number of entries for a single query is small.

It is possible to useClassLayoutThe class calculates the size of an entity class object, and if you want to calculate the final size of a class, you also need to calculate the fields in the entity class as well (fields in an object only store reference values).

Using other databases

We are currently storing the buried data into MySQL, and are serializing the reported data into JSON format, saving it in the value field, and deserializing it when we use it.

This leads to the fact that when querying buried data from the database, there is no way we can utilize SQL in a way that returns only a portion of the key in the JSON data (MySQL is supposed to be able to convert string to JSON operations at query time).

If we can do this, then when querying the data, we can query on demand, and to some extent, reduce the memory overhead. Buried data can be stored in a non-relational database (MySQL also supports JSON storage), such as MongoDB.

Response Body Optimization

Response body optimization is also an essential part. Especially for cases such as larger response bodies & small server bandwidth, I think response body optimization is a must.

Response body optimization measures are available:

Select the appropriate response style.
Use paging.
In the case of JSON and other similar formats, the length of the key can be reduced if there are a large number of identical keys.
Remove unneeded fields.

For report analytics, the response formatting I used was JSON, and there were a lot of duplicate keys, so I shortened those duplicate keys as much as possible (if I did that, I wouldn't have to think about readability, and just put the comments in the code), as well as removing the keys that the front-end wouldn't use.

Comparison of response body before and after optimization (no gzip compression on, ApiFox tool):

4.82M is the result of shortening the long key to two or three letters

As you can see from the above, we can reduce the size of the response body by about 50% just by making some changes to the key of the response body, and if we add the response body compression, the improvement in response time is definitely huge.

Response body optimization here, I also thought of a point, but the final return may not be good. If there is more data and a higher degree of duplication, then is it possible to consolidate multiple duplicates into one, that is, the back-end will return only one, by using other fields with smaller data to illustrate to the front-end, but in this way, it may be more complicated to operate.

Enable automatic compression

Because the report analysis results need to be sent to the front-end for display, the content of the response body is also relatively large. If you analyze hundreds of thousands of data, the response body may be several dozen megabytes without compression.

In SpringBoot, to turn on response body compression, you just need to add the

server:
  compression:
    enabled: true
    mime-types: application/json
    min-response-size: 2097152 # work unit (one's workplace)byte

There is no need to add additional configuration to the response header. Comparison before and after enabling response body compression:

With response body compression turned on, the response body size drops from 5.1M to 654Kb.

(computing) cache

The cache will not be forgotten, so I won't go into that here.

JVM

Increase heap memory size

If we need to load a large amount of data from the database into memory for computation, we have to set the heap memory size carefully. If the heap memory is set small, OOM is prone to occur, and even if it doesn't occur, it may still exist, and more than 90% of the time is used for GC recovery.

JVM string de-duplication

In the report analysis function, in the calculation of dimensions and indicators, will need to use the buried data in the value of a field, however, is currently using MySQL storage, storage, is the buried data into JSON characters stored in the use of the process, and then deserialize the JSON into other objects, such as POJO, Map and so on.

The process of deserialization may be executed multiple times, that is, there will be the possibility to convert the{\"name\": \"xcye\", \"age\": 12,\"site\" :\"www\"}When parsing multiple times, there are multiple copies of the same character. These duplicate strings also occupy heap memory, and when GC is triggered, they cannot be reclaimed because they are still in use, according to the reachability principle.

Since this is the case, if it is possible to be able to de-duplicate strings during GC, then more heap memory can be freed up after GC, especially for, say, frequent deserialization scenarios.

This is provided in the G1 collector by using the-XX:+UseStringDeduplicationparameter enables this feature, which is disabled by default.-XX:+PrintStringDeduplicationStatisticsparameter prints statistics on string de-duplication. It is said that string de-duplication can reduce heap memory pressure by 10%. The basic principle is that many strings are identical, so it is possible for these string objects to share the same character array, rather than each string object having its own character array.

/javase/8/docs/technotes/tools/windows/

Because there are a lot of duplicate strings in this function when doing report analysis, I picked one of the de-duplication results as follows:

[GC concurrent-string-deduplication, 21.1M->2127.8K(19.0M), avg 79.6%, 0.0448218 secs]
   [Last Exec: 0.0448218 secs, Idle: 0.1404486 secs, Blocked: 0/0.0000000 secs]
      [Inspected:          224587]
         [Skipped:              0(  0.0%)]
         [Hashed:          224587(100.0%)]
         [Known:                0(  0.0%)]
         [New:             224587(100.0%)     21.1M]
      [Deduplicated:       198174( 88.2%)     19.0M( 90.2%)]
         [Young:                1(  0.0%)     24.0B(  0.0%)]
         [Old:             198173(100.0%)     19.0M(100.0%)]
   [Total Exec: 36/0.1176846 secs, Idle: 36/256.6703979 secs, Blocked: 35/0.0046191 secs]
      [Inspected:          649023]
         [Skipped:              0(  0.0%)]
         [Hashed:          536577( 82.7%)]
         [Known:            48610(  7.5%)]
         [New:             600413( 92.5%)     70.3M]
      [Deduplicated:       463019( 77.1%)     55.9M( 79.6%)]
         [Young:            45391(  9.8%)   4292.6K(  7.5%)]
         [Old:             417628( 90.2%)     51.8M( 92.5%)]
   [Table]
      [Memory Usage: 5256.4K]
      [Size: 131072, Min: 1024, Max: 16777216]
      [Entries: 180582, Load: 137.8%, Cached: 0, Added: 183780, Removed: 3198]
      [Resize Count: 7, Shrink Threshold: 87381(66.7%), Grow Threshold: 262144(200.0%)]
      [Rehash Count: 0, Rehash Threshold: 120, Hash Seed: 0x0]
      [Age Threshold: 3]
   [Queue]
      [Dropped: 0]

Above this de-duplication record occurs when, for the first time, all the queried data in the value and otherData (are JSON) deserialization operation (to Map and other objects), there are a large number of duplicate key and duplicate value.

The above record can also be seen, this time 224587 strings are newly generated, of which 88.2% of the strings are duplicated, the result of de-duplication is, from 21.1M to 2127.8K. [The above record only represents a de-duplication, with the execution of the program, there will be de-duplication operation]

With JProfiler we can also observe that enabling and disabling the-XX:+UseStringDeduplicationThe heap memory situation at the time of

summarize

Through this accident, let me realize the usual development, add new features, modify the code process, must be considered comprehensively, not only consider the design of this part of the code, but also should take into account the introduction of this feature or modification, may lead to the impact.

When dealing with large data volumes (analytics, exporting, and other requirements), if the feature has passed its own tests, be sure to pressure test it, to the death on your own computer. See what happens in extreme cases, and then based on the test results, further optimization.

When I myself write a tool class, I tend to ignore the performance or other problems that this tool may have. So, when writing a tool class, it is important to think thoroughly, or at least more carefully than when writing other common classes (extension issues, performance issues...) ....