This is the correct position for batch UPDATE!

preamble

I was recently asked by one of my buddies how to write the code for batch updates in practice.

This is quite a representative question, so I'm going to take it out today and share it with you in the hope that it will help you.

1 Scene of the crime

One morning, in my Knowledge Planet group, one of my buddies asked me a question: do you guys usually use when case for batch updates? Or is there another way to batch update?

My answer is: our planet's mall project, there are batch update code can be referred to, a lot of code in this project, you can usually look at more.

Then I posted the key code to the group, which is a business scenario for resetting user passwords in bulk:

<update  parameterType="">
    <foreach collection="list" item="entity" separator=";">
        UPDATE sys_user
        SET password = #{},update_user_id=#{},update_user_name=#{}
        <where>
            id = #{}
        </where>
    </foreach>
</update>

Some of my buddies said that it was the first time they had seen this kind of writing, and that they had gained knowledge.

If interested in Knowledge Planet, click on this:celestial body (e.g. planet, satellite etc)。

Another question is, what's the difference between the above way of writing and updating in a direct for loop?

for(UserEntity userEntity: list) {
   (userEntity);
}

A direct for loop requires multiple requests to the database, with some network overhead, and is obviously not as good as requesting the database in bulk once.

2 Other ways to write batch updates

A guy said that he had always written in case when before.

Something like the following:

<update  parameterType="">
  update sys_user
  <trim prefix="set" suffixOverrides=",">
      <trim prefix="password = case id" suffix="end,">
          <foreach collection="list" item="item">
              when #{} then #{}
          </foreach>
      </trim>
      <trim prefix="update_user_id = case id" suffix="end,">
          <foreach collection="list" item="item">
              when #{} then #{}
          </foreach>
      </trim>
      <trim prefix="update_user_name = case id" suffix="end">
          <foreach collection="list" item="item">
              when #{} then #{}
          </foreach>
      </trim>
  </trim>
  <where>
      id in (
      <foreach collection="list" separator="," item="item">
          #{}
      </foreach>
      )
  </where>
</update>

But this way of writing obviously requires splicing in a lot of conditions, is a bit complicated, and doesn't perform very well.

There are also some articles that describe that you can use in INSERT, you can add the ON DUPLICATE KEY UPDATE keyword at the end of the statement.

 <update  parameterType="">
    insert into sys_user
    (id,username,password) values
    <foreach collection="list" index="index" item="item" separator=",">
        (#{},
        #{},
        #{})
    </foreach>
    ON DUPLICATE KEY UPDATE
     password=values(password)
</update>

When inserting data, the database first determines whether the data exists, and if it does not, performs an insert operation. If it exists, the update operation is performed.

I've used this approach before, and it generally requires the creation of unique indexes.

Because many times the primary key id, which is automatically grown or generated according to the snowflake algorithm, is not the same every time, it is impossible to distinguish the uniqueness of multiple requests for the same business parameters.

Therefore, it is recommended to create a unique index to ensure the uniqueness of the business data.

For example: create a unique index for username, and when you insert, you find that username already exists, then you perform an update operation to update password.

This way of updating data in bulk, performance is better, but the average large company will rarely use, because it is very easy to deadlock problems.

Therefore, the best option for batch updating data at the moment is still the first method I described at the beginning of the article.

3 A problem has been identified

Another partner in the group, following my suggestion, tried this batch update operation of foreach in his own project, but the code reported an exception:

sql injection violation, multi-statement not allow

This exception is reported in the WallFilter of the Alibaba druid package.

It has a checkInternal method that does some checks on the sql statement and throws an exception if the conditions are not met:

And druid does not support multiple statement statements in one sql statement by default, e.g., our batch UPDATE data scenario.

In addition, MySQL also turns off bulk updating of data by default, but we can add string parameters to the url to be on the jdbc:&allowMultiQueries=true, turn on the batch update operation.

For example:

datasource:
    type: 
    druid:
      driver-class-name: 
      url: jdbc:mysql://localhost:3306/console?useUnicode=true&characterEncoding=utf-8&useSSL=false&allowMultiQueries=true
      username: root
      password: root

This change is very simple.

But how do you solve the checksum problem in WallFilter?

So, I googled it, and you can adjust the judgment logic of the filters in druid by parameters like:

spring:
  datasource:
    url: jdbc:xxx&serverTimezone=Asia/Shanghai&rewriteBatchedStatements=true&allowMultiQueries=true
    username: xxx
    password: xxx
    driver-class-name: 
    type: 
    druid:
      filter:
        wall:
          config:
            multi-statement-allow: true
            none-base-statement-allow: true

By setting multi-statement-allow and none-base-statement-allow in the filter to true, this enables the bulk update feature.

4 Never entered into force

Ordinary datasource configurations using druid are OK with tweaks like the above.

However, some partners found that our mall project, through the above two places of modification, or has been reported below the exception:

sql injection violation, multi-statement not allow

What's going on here?

A: The order table in our mall project, using shardingsphere to do the library and table, and the use of baomidou to realize the function of dynamic switching of multiple data sources:

<dependency>
    <groupId></groupId>
    <artifactId>dynamic-datasource-spring-boot-starter</artifactId>
    <version>3.1.1</version>
</dependency>

We are using the data source configuration under the baomidou package, which is in the DynamicDataSourceProperties class:

/**
 * Copyright © 2018 organization baomidou
 * <pre>
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 * /licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 * <pre/>
 */
package ;

import ;
import ;
import ;
import ;
import ;
import ;
import ;
import .slf4j.Slf4j;
import ;
import ;
import ;
import ;

import ;
import ;

/**
 * DynamicDataSourceProperties
 *
 * @author TaoYu Kanyuxia
 * @see DataSourceProperties
 * @since 1.0.0
 */
@Slf4j
@Getter
@Setter
@ConfigurationProperties(prefix = )
public class DynamicDataSourceProperties {

    public static final String PREFIX = "";
    public static final String HEALTH = PREFIX + ".health";

    /**
     * The default library must be set,default (setting)master
     */
    private String primary = "master";
    /**
     * Whether to enable strict mode,default (setting)不启动. Strict mode does not match the data source directly report an error, 非严格模式下则使用default (setting)数据源primaryThe set data source
     */
    private Boolean strict = false;
    /**
     * Whether or not to usep6spyexports，default (setting)不exports
     */
    private Boolean p6spy = false;
    /**
     * Whether or not to useseata,default (setting)不使用
     */
    private Boolean seata = false;
    /**
     * Whether or not to use spring actuator Monitoring and Inspection，default (setting)不检查
     */
    private boolean health = false;
    /**
     * Each data source
     */
    private Map<String, DataSourceProperty> datasource = new LinkedHashMap<>();
    /**
     * Multiple Data Source Selection Algorithmclazz，default (setting)负载均衡算法
     */
    private Class<? extends DynamicDataSourceStrategy> strategy = ;
    /**
     * aopsectioning order，default (setting)优先级最高
     */
    private Integer order = Ordered.HIGHEST_PRECEDENCE;
    /**
     * DruidGlobal Parameter Configuration
     */
    @NestedConfigurationProperty
    private DruidConfig druid = new DruidConfig();
    /**
     * HikariCpGlobal Parameter Configuration
     */
    @NestedConfigurationProperty
    private HikariCpConfig hikari = new HikariCpConfig();

    /**
     * 全局default (setting)publicKey
     */
    private String publicKey = CryptoUtils.DEFAULT_PUBLIC_KEY_STRING;
}

This class is the configuration class for the database, and we can see that the master and druid configurations are at the same level, so change the configuration in the file to something like the following:

spring:
  application:
    name: mall-job
  datasource:
    dynamic:
      primary: master
      datasource:
        master:
          username: root
          password: 123456
          url: jdbc:mysql://localhost:3306/susan_mall?serverTimezone=Asia/Shanghai&characterEncoding=utf8&useSSL=false&zeroDateTimeBehavior=convertToNull
          driver-class-name: 
       druid:
        wall:
          multiStatementAllow: true
          noneBaseStatementAllow: true

After this change, the mall project to use foreach this batch update data function OK.

5 Lastly

This article begins with a question from a golfer and discusses four common ways to batch update:

Update one line at a time in a for loop
The foreach splices the update statement to batch update.
Use case when to make a judgment.
Use insert into on duplicate key update syntax to batch insert or batch update.

There are many ways to do this, but I personally think the best way to batch UPDATE is the 2nd way.

But need to need the place is to use foreach to do batch update, one-time update of the data should not be too much, try to control within 1000, so that the performance of the update is still good.

If the data to be updated exceeds 1000, it needs to be split into multiple batches.

In addition, if you encounter the problem of performing a batch UPDATE operation that does not support batch updates:

sql injection violation, multi-statement not allow

First of all, you have to add &allowMultiQueries=true parameter after the url of the database connection to enable the batch update operation of the data.

If you are using the druid database driver, you can adjust the parameters of the filter in the configuration file.

spring:
  datasource:
    druid:
      filter:
        wall:
          config:
            multi-statement-allow: true
            none-base-statement-allow: true

The main thing is that multi-statement-allow is set to true.

If you also use other third-party database middleware, for example, I use baomidou to implement the function of dynamic switching between multiple data sources.

At this point, you need to check its source code to confirm how its multi-statement-allow configuration parameter is configured, which may be different from druid.

One final note (ask for attention, don't patronize me)

If this article is helpful to you, or if you are inspired, help scan the QR code below to pay attention to it, your support is my biggest motivation to keep writing.
Ask for a one-click trifecta: like, retweet, and watch at.
Concerned about the public number: [Su San said technology], in the public number reply: into the big factory, you can get free access to my recent organization of 100,000 words of the interview dictionary, a lot of partners rely on this dictionary to get a number of big factory offers.