How do I keep MGR from cloning data from the Primary node?

concern

In MGR, when a new node joins, it will first go through a distributed recovery phase in order to keep its data consistent with the other nodes in the group. In this phase, the new node randomly selects a node (Donor) in the group to synchronize the difference data.

Prior to MySQL 8.0.17, there was only one way to synchronize, Binlog-based asynchronous replication, which was suitable for scenarios where there was little discrepant data or where all the required Binlogs were present.

Starting from MySQL 8.0.17, there is a new synchronization method - Clone Plugin. Clone Plugin can be used for physical backup recovery, which is suitable for scenarios where there is a lot of discrepant data or the required Binlog has been purged.

Although the cloning plugin greatly improves the efficiency of recovery, backup is, after all, an IO-intensive operation that can easily affect the performance of the backup instance, so we generally don't want cloning operations to be performed on Primary nodes.

But Donor's selection is random (as will be demonstrated later), is there a way to keep MGR from cloning data from the Primary node?

This paper consists of the following sections:

How does MGR perform cloning operations?
Can I set the Donor via clone_valid_donor_list?
How did MGR choose Donor?
Implementation logic for the MGR clone operation.
group_replication_advertise_recovery_endpoints Timing of entry into force。

How does MGR perform cloning operations?

At first it was thought that MGR performs cloning operations by calling some internal interface of the cloning plugin. But in fact, MGR calls theCLONE INSTANCECommand.

// plugin/group_replication/src/sql_service/sql_service_command.cc
long Sql_service_commands::internal_clone_server(
    Sql_service_interface *sql_interface, void *var_args) {
  ...
  std::string query = "CLONE INSTANCE FROM \'";
  (q_user);
  ("\'@\'");
  (q_hostname);
  ("\':");
  (std::get<1>(*variable_args));
  (" IDENTIFIED BY \'");
  (q_password);
  bool use_ssl = std::get<4>(*variable_args);
  if (use_ssl)
    ("\' REQUIRE SSL;");
  else
    ("\' REQUIRE NO SSL;");

  Sql_resultset rset;
  long srv_err = sql_interface->execute_query(query, &rset);
  ...
}

Since the CLONE INSTANCE command was called, wouldn't it be possible to set the Donor (cloned instance) via the clone_valid_donor_list parameter?

Can you set the Donor via clone_valid_donor_list?

Can't.

After obtaining the endpoint of the Donor, which consists of the hostname and port, MGR passes theupdate_donor_listFunction sets clone_valid_donor_list.

The value of clone_valid_donor_list is the endpoint of the Donor.

So explicitly setting clone_valid_donor_list in the mysql client before starting group replication has no effect.

// plugin/group_replication/src/plugin_handlers/remote_clone_handler.cc
int Remote_clone_handler::update_donor_list(
    Sql_service_command_interface *sql_command_interface, std::string &hostname,
    std::string &port) {
  std::string donor_list_query = " SET GLOBAL clone_valid_donor_list = \'";
  plugin_escape_string(hostname);
  donor_list_query.append(hostname);
  donor_list_query.append(":");
  donor_list_query.append(port);
  donor_list_query.append("\'");
  std::string error_msg;
  if (sql_command_interface->execute_query(donor_list_query, error_msg)) {
      ...
  }
  return 0;
}

Since the Donor comes first and then the clone_valid_donor_list is set, let's see how MGR selects the Donor next?

How did MGR choose Donor?

MGR selects Donor in two steps:

First, it determines which nodes are suitable to be Donors, and the nodes that satisfy the condition are put into a dynamic array (m_suitable_donors).Remote_clone_handler::get_clone_donorsfunction is implemented.
Second, the loop iterates through the nodes in m_suitable_donors as Donor. if the first node fails to perform the cloning operation, the second node will be selected, and so on.

Here, let's see.Remote_clone_handler::get_clone_donorsThe implementation details of the

void Remote_clone_handler::get_clone_donors(
    std::list<Group_member_info *> &suitable_donors) {
  // Get information about all nodes in the cluster
  Group_member_info_list *all_members_info =
      group_member_mgr->get_all_members();
  if (all_members_info->size() > 1) {
    // Here the original all_members_info is broken up, from which you can see that the donor is chosen at random.
    vector_random_shuffle(all_members_info);
  }

  for (Group_member_info *member : *all_members_info) {
    std::string m_uuid = member->get_uuid();
    bool is_online =
        member->get_recovery_status() == Group_member_info::MEMBER_ONLINE;
    bool not_self = m_uuid.compare(local_member_info->get_uuid());
    // Note that this is just a comparison of versions
    bool supports_clone =
        member->get_member_version().get_version() >=
            CLONE_GR_SUPPORT_VERSION &&
        member->get_member_version().get_version() ==
            local_member_info->get_member_version().get_version();

    if (is_online && not_self && supports_clone) {
      suitable_donors.push_back(member);
    } else {
      delete member;
    }
  }

  delete all_members_info;
}

The processing flow of this function is as follows:

Get information about all the nodes in the cluster and store it in all_members_info.

all_members_info is a dynamic array whose elements are stored in descending order of the node server_uuid.
pass (a bill or inspection etc)vector_random_shufflefunction takes all_members_info and randomly rearranges it.
Select nodes with ONLINE status and version greater than or equal to 8.0.17 to add to suitable_donors.

Why 8.0.17? Because the cloning plugin was introduced with MySQL 8.0.17.

Note that the versions are only compared here, there is no judgment on whether the clone plugin is actually loaded or not.

The suitable_donors in the function are actually m_suitable_donors.

get_clone_donors(m_suitable_donors);

Based on the previous analysis, it can be seen that the Donor as the cloned node is chosen randomly in MGR.

Since Donor's selection is random, it seems impossible to not clone data from the Primary node.

At this point in the analysis, the problem seems to be insoluble.

Don't worry, let's analyze the implementation logic of the MGR cloning operation.

MGR Clone Operation Implementation Logic

The MGR cloning operation is performed on theRemote_clone_handler::clone_thread_handlefunction is implemented.

// plugin/group_replication/src/plugin_handlers/remote_clone_handler.cc
[[noreturn]] void Remote_clone_handler::clone_thread_handle() {
  ...
  while (!empty_donor_list && !m_being_terminated) {
    stage_handler.set_completed_work(number_attempts);
    number_attempts++;

    std::string hostname("");
    std::string port("");
    std::vector<std::pair<std::string, uint>> endpoints;

    mysql_mutex_lock(&m_donor_list_lock);
    // m_suitable_donors is all nodes that are Donor eligible
    empty_donor_list = m_suitable_donors.empty();
    if (!empty_donor_list) {
      // Get the first element of the array
      Group_member_info *member = m_suitable_donors.front();
      Donor_recovery_endpoints donor_endpoints;
      // Get Donor's endpoint information
      endpoints = donor_endpoints.get_endpoints(member);
      ...
      // Remove the first element from the array
      m_suitable_donors.pop_front();
      delete member;
      empty_donor_list = m_suitable_donors.empty();
      number_servers = m_suitable_donors.size();
    }
    mysql_mutex_unlock(&m_donor_list_lock);

    // No valid donor in the list
    if (() == 0) {
      error = 1;
      continue;
    }
    // Loop over each of the endpoints.
    for (auto endpoint : endpoints) {
      ();
      (std::to_string());

      // set clone_valid_donor_list
      if ((error = update_donor_list(sql_command_interface, hostname, port))) {
        continue; /* purecov: inspected */
      }

      if (m_being_terminated) goto thd_end;

      terminate_wait_on_start_process(WAIT_ON_START_PROCESS_ABORT_ON_CLONE);
      // Perform a cloning operation
      error = run_clone_query(sql_command_interface, hostname, port, username,
                              password, use_ssl);

      // Even on critical errors we continue as another clone can fix the issue
      if (!critical_error) critical_error = evaluate_error_code(error);

      // On ER_RESTART_SERVER_FAILED it makes no sense to retry
      if (error == ER_RESTART_SERVER_FAILED) goto thd_end;

      if (error && !m_being_terminated) {
        if (evaluate_server_connection(sql_command_interface)) {
          critical_error = true;
          goto thd_end;
        }

        if (group_member_mgr->get_number_of_members() == 1) {
          critical_error = true;
          goto thd_end;
        }
      }

      // If it fails, select the next endpoint to retry.
      if (!error) break;
    }

    // If it fails, select the next Donor to retry.
    if (!error) break;
  }
...
}

The processing flow of this function is as follows:

First, a Donor is selected, and as you can see, the code is passed through thefront()function to get the first element in m_suitable_donors.
Get information about Donor's endpoints.
Loop over each of the endpoints.
Sets the clone_valid_donor_list.
Performs a cloning operation. If the operation fails, a retry is performed, starting with selecting the next endpoint to retry. If all endpoints have been traversed and there is still no success, the next Donor is selected and retried until all Donors have been traversed.

Of course, retries are conditional, and will not be performed in the following cases:

error == ER_RESTART_SERVER_FAILED: Instance restart failed.

Instance restart is the final step in the cloning operation, and the previous steps in order are: 1. Obtain a backup lock. 2. DROP the user tablespace. 3. Copy data from the Donor instance.

Since all the data has been copied, there is no need for a retry.
The connection on which the cloning operation was performed was KILLed and the rebuild failed.
group_member_mgr->get_number_of_members() == 1: The cluster has only one node.

Since the clone operation fails with a retry, the idea is that if you don't want the clone operation to execute on the Primary node, it's simple to make the clone operation on the Primary node fail.

How do you make it fail?

For a cloning operation to successfully execute on a Donor (the cloned node), the Donor needs to fulfill the following conditions:

Install the cloning plugin.
Cloning a user requires the BACKUP_ADMIN privilege.

So, if you want the cloning operation to fail, any one of the conditions is not satisfied. The first one is recommended, i.e. not installing or uninstalling the cloning plugin.

Why don't you recommend recycling permissions this way?

Because the operation of uninstalling the clone plugin (uninstall plugin clone) will not log the Binlog, while recycling permissions will.

While the operation of reclaiming permissions can also be done through theSET SQL_LOG_BIN=0 This approach does not log the Binlog, but this in turn leads to inconsistencies in the data across the nodes of the cluster. Therefore, this way of reclaiming permissions is highly discouraged.

So, if you don't want MGR to clone data from the Primary node, just uninstall the cloning plugin for the Primary node.

The problem is solved, but I still have a question: why are there multiple endpoints in endpoints? Shouldn't it be the Donor's instance address, just one? This is actually related to group_replication_advertise_recovery_endpoints.

group_replication_advertise_recovery_endpoints

The group_replication_advertise_recovery_endpoints parameter was introduced in MySQL 8.0.21 to customize recovery addresses.

Look at this example below.

group_replication_advertise_recovery_endpoints= "127.0.0.1:3306,127.0.0.1:4567,[::1]:3306,localhost:3306"

When setting up, it is required that the port must be from port, report_port, or admin_port.

The hostname only needs to be a valid address on the server (there may be multiple NICs on a server, corresponding to multiple IPs) and does not need to be specified in bind_address or admin_address.

In addition to this, users need to grant the SERVICE_CONNECTION_ADMIN privilege if they want to perform distributed recovery operations through the admin_port.

Let's look at the timing of group_replication_advertise_recovery_endpoints taking effect.

After selecting the Donor, MGR calls theget_endpointsto get the endpoints of the Donor.

// plugin/group_replication/src/plugin_variables/recovery_endpoints.cc
Donor_recovery_endpoints::get_endpoints(Group_member_info *donor) {
  ...
  std::vector<std::pair<std::string, uint>> endpoints;
  // donor->get_recovery_endpoints().c_str() assume (office) group_replication_advertise_recovery_endpoints value of
  if (strcmp(donor->get_recovery_endpoints().c_str(), "DEFAULT") == 0) {
    error = Recovery_endpoints::enum_status::OK;
    endpoints.push_back(
        std::pair<std::string, uint>{donor->get_hostname(), donor->get_port()});
  } else {
    std::tie(error, err_string) =
        check(donor->get_recovery_endpoints().c_str());
    if (error == Recovery_endpoints::enum_status::OK)
      endpoints = Recovery_endpoints::get_endpoints();
  }
  ...
  return endpoints;
}

If group_replication_advertise_recovery_endpoints is DEFAULT, the hostname and port of the Donor are set to endpoint.

Note that the hostname and port of the node are actually MEMBER_HOST and MEMBER_PORT in performance_schema.replication_group_members.

The logic for hostname and port is as follows:

// sql/rpl_group_replication.cc
void get_server_parameters(char **hostname, uint *port, char **uuid,
                           unsigned int *out_server_version,
                           uint *out_admin_port) {
  ...
  if (report_host)
    *hostname = report_host;
  else
    *hostname = glob_hostname;

  if (report_port)
    *port = report_port;
  else
    *port = mysqld_port;
  ...
  return;
}

Priority is given to report_host, report_port, followed by hostname, port of mysqld.

If group_replication_advertise_recovery_endpoints is not DEFAULT, the value of this parameter is set to endpoints.

So a node, set group_replication_advertise_recovery_endpoints will only have an effect if it is selected as a Donor.

And whether a node has group_replication_advertise_recovery_endpoints set or not has no bearing on whether it can be selected as a Donor.

summarize

MGR's selection of Donor is random.
MGR sets clone_valid_donor_list to the endpoint of the Donor before performing the clone operation, so explicitly setting clone_valid_donor_list in the mysql client before starting group replication has no effect.
MGR performs the cloning operation and actually calls theCLONE INSTANCECommand.
MEMBER_HOST and MEMBER_PORT in performance_schema.replication_group_members, prioritize the use of report_host, report_port, followed by the hostname, port of mysqld.
A node with group_replication_advertise_recovery_endpoints set will only have an effect if it is selected as a Donor.
If you do not want MGR to clone data from the Primary node, simply uninstall the cloning plugin for the Primary node.

Extended Reading

MySQL 8.0 New Features - Clone Plugin
MySQL in Action Group Replication Chapter