Senior Engineer Interviews - Messaging Middleware

1.1 What are the benefits of using RabbitMQ?

1, decoupling, system A in the code directly call system B and system C code, if the future D system access, system A also need to modify the code, too much trouble!

2, asynchronous, the message will be written to the message queue, non-essential business logic to run in an asynchronous manner to speed up the response rate

3, peak-cutting, high concurrency, all the requests directly dislike to the database, resulting in database connection abnormalities

1.2 RabbitMQ Basic Concepts

1, Broker: simply put is the message queue server entity

2, Exchange: message switch, it specifies the rules according to which the message, routed to which queue

3, Queue: message queue carrier, each message will be put into one or more queues

4, Binding: Binding, its role is to exchange and queue in accordance with the routing rules bound together

5、 Routing Key： Routing keyword, exchange according to this keyword for message delivery

6, VHost: vhost can be understood as a virtual broker, i.e., mini-RabbitMQ server, which internally contains independent queue, exchange and binding, etc., but the most important thing is that it has an independent privilege system, which allows user control in the vhost range. Of course, from the global perspective of RabbitMQ, vhost can be used as a means of isolating different privileges (a typical example is that different applications can run in different vhosts).

7, Producer: message producer, is the delivery of the message program

8, Consumer: message consumer, is to accept the message program

9, Channel: message channel, in each connection of the client, you can establish multiple channels, each channel represents a session task.

It takes Exchange, Queue, and RoutingKey to determine a unique route from Exchange to Queue.

1.3 4 types of switches?

fanout:Routes all messages sent to this switch to all queues bound to this switch.

direct:Routes the message to a queue with an exact match of BindingKey and RoutingKey.

topic:

Matching rules:

RoutingKey is a dotted '.' : separated string. For example.

The BindingKey, like the RoutingKey, is also a dotted ". " string.

BindingKey can use _ and # to do fuzzy matching, _ matches one word, # matches multiple or zero.

headers:Does not rely on the routing key matching rule to route the message. It is based on the headers attribute in the content of the sent message. Poor performance, basically not used.

1.4 How to avoid double delivery or double consumption of messages?

In message production, MQ internally generates an inner-msg-id for each message sent by the producer as the basis for de-duplication and idempotence (message delivery failure and re-transmission), to avoid duplicate messages from entering the queue; in message consumption, a bizId (globally unique for the same business, e.g., payment ID, order ID, post ID, etc.) is required in the message body as the basis for de-duplication and idempotence, to avoid the same message from being consumed repeatedly. re and idempotent basis to avoid repeated consumption of the same message.

This question is answered for business scenarios in the following points:

1. Get this message to do the insert operation of the database. Then give this message to do a unique primary key, then even if there is a repeated consumption of the situation, it will lead to primary key conflict, to avoid the database dirty data.

2, get this message to do Redis set operation, because you no matter set a few times the result is the same, set operation was even idempotent operation.

3. If the above two cases do not work. Prepare a third-party medium to do consumption records. Take Redis as an example, assign a global id to the message, as long as the consumption of the message, will <id,message> in the form of K-V written into Redis. then the consumer starts to consume before, first go to Redis to query whether there is a consumption record can be.

1.5 How do I ensure that messages are not lost?

possible loss of rabbitMQ messages:

1. The producer sender sends a message but does not enter the queue.

Setting the channel to confirm mode assigns a unique ID to all messages posted on the channel, once the message has been delivered to the destination queue, or once the message has been written to disk.

(persistent message), the channel sends an acknowledgement to the producer (the unique ID of the packet).

2. The receiver receives the message, but there is an error in the processing.
When a consumer gets a message, it sends an acknowledgement ACK to RabbitMQ to tell it that the message was received. However, there are two types of acknowledgement ACKs.

Auto ACK :Once the message has been received, the consumer automatically sends an ACK

If the message is not too important and there is no effect of losing it, then an automatic ACK is more convenient

Manual ACK :After the message is received, no ACK will be sent, you need to call it manually.

-If the message is so important that it cannot be lost, then it is better to manually ACK the message after it has been consumed. If the message is too important to be lost, it is better to manually ACK the message after it has been consumed, otherwise it will be automatically ACKed when it is received and RabbitMQ will remove the message from the queue. If the consumer is down at that moment, then the message is lost.

3. Queue or column machine.
Message persistence, provided of course that the queue and switch are persistent

1.6 How rabbitmq implements a delayed message queue

To use RabbitMQ to implement a delayed task, you must first understand two concepts of RabbitMQ: the TTL of the message and the dead letter Exchange, which are used in combination to realize the above requirements. If the queue is set and the message is also set, then it will take a small one. So if a message is routed to a different queue, it is possible that the message will die at a different time (with different queue settings) O

Deferred tasks are realized by the TTL of the message and Dead Letter Exchange. We need to create two queues, one for sending messages and one for forwarding the destination queue after the message expires.

The producer outputs a message to Queuel, and this message is set to have a validity time, such as 3 minutes. The message waits in Queuel for 3 minutes, and if no consumer receives it, it is forwarded to Queue2 , which has consumers, receives it, and handles the delayed task.

1.7 How to solve the delay and expiration problems of message queues? What should I do when my message queue is full? What should I do if I have millions of messages that have been backlogged for hours?

Message backlog treatment: temporary emergency expansion:

1. Fix the consumer problem first, make sure it resumes consuming at the same rate, and then stop all existing cnosumer.

2, create a new topic, partition is the original 10 times, temporarily establish the original 10 times the number of queues.

3, and then write a temporary distribution of data consumer program, this program is deployed to consume the backlog of data, consumption is not time-consuming processing, directly polled evenly into the temporary establishment of 10 times the number of queue.

4, then temporarily requisition 10 times the number of machines to deploy consumers, each batch of consumers to consume the data of a temporary queue. This is equivalent to temporarily expanding the queue and consumer resources by 10 times to consume data at 10 times the normal rate.

5. After rapidly consuming the backlog of data, you have to revert to the originally deployed architecture and use the original consumer machines to consume the messages again.

6, MQ message expiration: Assuming you are using RabbitMQ, RabbtiMQ can set the expiration time, that is, TTL. if the message in the queue backlog more than a certain period of time will be RabbitMQ to clean up, the data is gone. That's the second pitfall. It's not so much that there's a lot of data backed up in the mq, but that a lot of data is lost. We can take a program, that is, batch re-direction, which we also have a similar scenario on the line before. That is, when there is a large backlog, we directly discard the data at that time, and then after the peak period, for example, we drink coffee together to stay up until 12:00 p.m., the users have gone to bed. At this time, we will start writing programs to write a temporary program to find out the lost data bit by bit, and then re-fill into the mq to make up for the data lost during the day. This is the only way. Suppose 10,000 orders are backlogged in the mq, not processed, of which 1,000 orders are lost, you can only manually write a program to check out the 1,000 orders, manually sent to the mq to make up again.

The mq message queue block is full:

What if the messages are backlogged in the mq and you haven't processed them for a long time, and then the mq is almost full? Is there another way to do this? No, who let your first program execution is too slow, you temporarily write the program, access to the data to consume, consume a discard one, do not want, quickly consume all the messages. Then go to the second program, to the night and then replenish the data it.

1.8 What are the conditions for a message to be reliably persisted if the queue and exchange have the durable attribute and the message has the persistent attribute?

The binding relationship can be expressed as exchange - binding - queue. From the documentation, we know that if we want to deliver a message without losing it, we need the message itself to have the persistent property set, and we need both exchange and queue to have the durable property set.

In fact, we can think of it this way. If exchange or queue does not set the durable attribute, it will not be able to recover after a crash, so even if the message has the persistent attribute set, there is still the problem that the message can be recovered but it has nowhere to go; similarly, if the message itself does not set the If the persistent attribute is not set on the message itself, there is no way to talk about message persistence.

1.9 How many clustering modes are there for RabbitMQ?

RabbitMQ is more representative, because it is based on the master-slave (non-distributed) to do high availability, we will take RabbitMQ as an example to explain how to implement the first MQ high availability.RabbitMQ has three modes: stand-alone mode, ordinary cluster mode, mirrored cluster mode.

1, stand-alone mode, that is, Demo level, generally is your local start up to play for fun? No one uses stand-alone mode for production.

2, ordinary mode: two nodes (rabbit01, rabbit02) as an example to illustrate, for Queue, the message entity exists only in one of the nodes rabbit01 (or rabbit02), rabbit01 and rabbit02 two nodes only have the same metadata, that is, the queue structure. When the message enters the Queue of rabbit01 node, and the consumer consumes it from rabbit02 node, RabbitMQ will temporarily perform a message transfer between rabbit01, rabbit02 to take out the message entity from A and send it to the consumer through B. Therefore, the consumer should try to connect to each node and take message from it. That is, for the same logical queue, the physical queue should be established in multiple nodes, otherwise, no matter the consumer connects to rabbit01 or rabbit02, the exit is always in rabbit01, which will create a bottleneck. When the rabbit01 node fails, the rabbit02 node cannot fetch the message entities that have not yet been consumed in the rabbit01 node. If message persistence is done, then it waits until the rabbit01 node recovers and then it can be consumed. Without message persistence, message loss occurs.

3, mirror mode: the need for the queue into a mirror queue, the existence of multiple nodes belonging to the RabibitMQ HA program, the mode solves the problem of the ordinary mode, the essence of the difference between the ordinary mode is that the message body will take the initiative to synchronize between the mirror nodes, rather than in the client to fetch the data when the temporary pull, the mode brings the side effects are also very obvious, in addition to reducing system performance, in addition. In addition to reducing system performance, if the number of mirror queues is too large, coupled with a large number of messages into the cluster internal network bandwidth will be greatly consumed by this synchronized communication, so in the case of high reliability requirements apply!

2.1 Concepts

Highly available, supported by almost all relevant open source software, meeting most application scenarios, especially in big data and streaming computing.

Kafka is efficient, scalable, and message persistent. Supports partitioning, replicas, and fault tolerance.
A lot of design has been done for batch and asynchronous processing, so Kafka can get very high performance.
Hundreds of thousands of asynchronous message messages are processed per second, and if compression is turned on, you can eventually reach the level of processing 2000w messages per second.
But since it is asynchronous and batch, the latency will also be high and not suitable for e-commerce scenarios.

2.2 What is kafka

Producer API: Allows applications to publish record streams to one or more Kafka topics.
Consumer API: allows an application to subscribe to one or more topics and process the stream of records generated for them.
Streams API: allows applications to act as stream processors, converting input streams to output streams.

Message

Kafka's data unit is called a message. A message can be thought of as a "data row" or a "record" in a database.

consignment

To increase efficiency, messages are written to Kafka in batches. increasing throughput but increasing response time.

Topic

Categorized by topic, similar to tables in a database.

Partition

Topics can be divided into partitions and distributed in the kafka cluster for easy scaling.

Individual partitions are ordered, and partitions are set to one to ensure global order.

Replicas

Each topic is divided into partitions with multiple copies of each partition.

Producer

The producer distributes messages evenly across all partitions of the topic by default:

Specify the partition of the message directly
Partitioning is based on the key hash of the message.
Polls the specified partition.

Consumer Comsumer

Consumers consume messages by using offsets to distinguish messages that have already been read. Storing the last read message offset for each partition on Zookeeper or Kafka prevents it from losing its read state if the consumer is shut down or restarted.

ComsumerGroup

Consumer groups ensure that each partition can only be used by one consumer to avoid duplicate consumption. If a consumer in the group fails, other consumers in the consumption group can take over the work of the failed consumer to rebalance and repartition.

Node Broker

Connecting producers and consumers, a single broker can easily handle thousands of partitions and millions of messages per second.

broker receives the message from the producer, sets the offset for the message, and commits the message to disk for safekeeping.
The broker serves the consumer, responding to requests to read partitions and returning messages that have been committed to disk.

clustering

There is a chief for every partition, and when a partition is assigned to multiple brokers, partition replication is performed through the chief.

Producer Offset

Messages are written with an offset, the latest maximum offset for each partition.

Consumer Offset

Consumers in different consumer groups can store different Offsets for a partition without affecting each other.

LogSegment

A partition consists of multiple LogSegments.
A LogSegment consists of .log .index .timeindex
The .log append is written sequentially and the file name is named after the offset of the first message in the file
Index for log deletion and data lookup can be quickly located.
.timeStamp then looks up the corresponding offset based on the timestamp.

2.3 Advantages and application scenarios

Pros:

High throughput: Single machine handles tens to millions of messages per second. Stable performance even with terabytes and messages stored.

Zero copy Reduce kernel-to-user copy, disk DMA via sendfile Copy socket buffer
Sequential Reads and Writes Leverages the ultra-high performance of sequential reads and writes to disk
Page cache mmap, maps disk files to memory, users can modify disk files by modifying memory.

High performance: Single node supports thousands of clients and guarantees zero downtime and zero data loss.
Persistence: Persists messages to disk. Prevents data loss by persisting data to disk and replication.
Distributed system, easy to expand. All components are distributed and machines can be expanded without downtime.
Reliability - Kafka is distributed, partitioned, replicated and fault tolerant.
Client-side state maintenance: The state of the message being processed is maintained on the Consumer side and can be automatically balanced when it fails.

Application Scenarios:

Log collection: with Kafka you can collect Logs from various services and process them through a big data platform;
Messaging systems: decoupling producers and consumers, caching messages, etc;
User activity tracking: Kafka is often used to record the various activities of Web users or App users, such as browsing, searching, clicking and other activities, these activities are published by various servers to Kafka's Topics, and then consumers subscribe to these Topics to do real-time monitoring and analysis of operational data, which can also be saved to the database;

2.4 Basic production-consumption processes

When created, a Sender thread is created and set as a daemon thread.

2. Produced messages first go through an interceptor - > serializer - > partitioner and then the message is cached in a buffer.

3. The conditions for batch sending are: the buffer data size reaches or reaches the upper limit.

4. After the batch is sent, it is sent to the specified partition and then dropped to the broker;

acks=0 As soon as the message is placed in the buffer, the message is considered to have been sent.
acks=1 indicates that the message only needs to be written to the primary partition. In this scenario, if the primary partition goes down after receiving the message acknowledgement and the replica partition has not had time to synchronize the message, the message is lost.
acks=all (default) The alpha partition waits for all ISR replica partitions to acknowledge the record. This processing ensures that messages are not lost as long as one ISR replica partition survives.

5. If the producer configures the retrires parameter to be greater than 0 and does not receive an acknowledgement, then the client retries the message.

6. drop disk to broker success, return production metadata to the producer.

2.5 leader election

Kafka maintains a collection called ISR (in-sync replica) for each Topic on Zookeeper;
kafka considers a message committed when all the replicas in the collection are synchronized with the replicas in the Leader;
Only these Follower's who stay in sync with the Leader should be selected as the new Leader;
Assuming that there are N+1 replicas of a topic. kafka can tolerate N servers being unavailable with low redundancy If all the replicas in the ISR are lost:

It can wait for any of the replicas in the ISR to be restored, followed by the external service, which takes time to wait;
A replica is selected from the OSR to be the Leader replica, at which point data loss occurs;

Copy message synchronization

First, the Follower sends a FETCH request to the Leader, which then reads the message data from the underlying log file and updates the LEO value of its copy of the Follower in memory to the fetchOffset value in the FETCH request. Finally, it tries to update the partition high water value. the Follower receives the FETCH response, writes the message to the underlying log, and then updates the LEO and HW values.

Related concepts: LEO and HW.

LEO: i.e. log end offset (log end offset), records the displacement value of the next message in the log of this replica. If LEO=10, then it means that the replica has saved 10 messages, and the displacement value range is [0, 9].
HW: The water level value HW (high watermark) is the backed up displacement. The HW value is not greater than the LEO value for the same replica object. All messages less than or equal to the HW value are considered "replicated".

Rebalance

Change in the number of group members
Change in the number of subscription topics
Change in the number of partitions in the subscription thread

After the leader election is completed, when the above three scenarios occur, the leader starts to allocate the consumption scenarios according to the configured RangeAssignor, i.e., which consumers are responsible for consuming which parts of which topics. once the allocation is completed, the leader will encapsulate the scenarios into a SyncGroup request and send it to the coordinator, and the non-leader will also send a SyncGroup request. The coordinator will also send a SyncGroup request, but the content will be empty. after receiving the allocation plan, the coordinator will encapsulate the plan in the SyncGroup response and send it to each consumer, so that all members of the group will know which partitions they should consume.

Partition allocation algorithm RangeAssignor

The principle is to divide the total number of consumers and the total number of partitions equally among all consumers;
Consumers subscribing to a Topic are sorted by the dictionary order of the name, split evenly, and the remaining dictionary order is distributed from front to back;

2.6 Idempotence

Ensure that the consumer does not duplicate processing when a message is retransmitted. Even if the processing is repeated when the consumer receives a duplicate message, the consistency of the final result is guaranteed. By idempotency, the mathematical concept is: f(f(x)) = f(x)

Add a unique ID, similar to a database primary key, to uniquely tag a message.

ProducerID:# each new Producer is assigned a unique PIDSequenceNumber when it is initialized:# each Topic that sends data for each PID corresponds to a SN value monotonically increasing from 0

2.7 How to ensure that data is not lost

Producer production messages can be resolved by comfirm configuring ack=all;
Broker synchronization process leader down can be solved by configuring ISR replica + retry;
Consumers lost can turn off the automatic submitoffset feature and submit offsets when the system finishes processing;

2.8 How to address backlogged consumption

Fix the consumer so that it has the ability to consume and expand it by N units;
Write a distribution program that evenly distributes Topics to temporary Topics;
N consumers are started at the same time, consuming different temporary Topics;

How to avoid message backlogs

Increasing consumption parallelism
bulk consumption
Reduce the number of component IO interactions
Priority consumption

2.9 How to Design a Message Queue

Need to support rapid horizontal expansion, broker + partition, partition put on different machines, increase the machine will be based on the topic of data migration, distributed need to consider consistency, availability, partition fault tolerance

Consistency: message acknowledgement for producers, idempotence for consumers, data synchronization for brokers;
Availability: how the data is guaranteed not to be lost or duplicated, how the data is persisted, and how it is read and written when persisted;
Partition fault tolerance: what election mechanism to use, how to synchronize multiple copies;
Massive data: how to solve the message backlog, massive Topic performance degradation;

Performance can draw on time wheels, zero-copy, IO multiplexing, sequential reads and writes, and compressed batch processing.

3.1 What roles does RocketMQ consist of, and what are the roles and characteristics of each?

3.2 Are messages in the RocketMQ Broker deleted immediately after they are consumed?

No, each message is persisted to the CommitLog, each Consumer connects to the Broker and maintains consumption progress information, when a message is consumed it's just the current Consumer's consumption progress (CommitLog's offset) that is updated.

3.3 Is consuming a message push or pull?

RocketMQ does not have a real sense of push, are pull, although there is a push class, but the actual underlying implementation of the long polling mechanism, that is, the way pulling

Why do you actively pull messages instead of using the event-listening approach?

The event-driven approach is to establish a good long connection and push in real time by means of events (sending data).

If the broker actively pushes the message, it is possible that the push speed is fast and the consumption speed is slow, then it will cause the message to pile up too much in the consumer side, and at the same time can not be consumed by other consumers. The pull approach can be based on their own current situation to pull, will not cause too much pressure and cause bottlenecks. So the pull method is adopted.

3.4 How are transaction messages implemented in RocketMQ?

Mechanism for implementing RocketMQ transaction messages:

1, half-message mechanism: first send a half-message to the message server, if the implementation of the local transaction is successful, then submit the message, otherwise rollback.

2. Local transaction execution: The message sender executes a local transaction after sending half a message.

3. Status check: The message server will periodically check the status of half messages.

4, transaction checkback: If the message state is uncertain, the message server will check back with the sender to ensure that the message is ultimately consistent.

3.5 How is message filtering implemented in RocketMQ?

The implementation of the RocketMQ message filtering feature:

1、Label Filtering: Producers set labels when sending messages and consumers selectively consume messages by specifying labels.

2, SQL92 filtering: support for filtering expressions based on the SQL92 standard , allowing more complex message filtering on the consumer side .

3、Client Filtering: After receiving the message, the consumer client can filter the message according to the customized logic.

4、Performance Optimization: Reduce the amount of data transmitted over the network through filtering to improve overall performance and efficiency.

3.6 How does RocketMQ ensure reliable message delivery?

Mechanism to ensure reliable delivery of RocketMQ messages:

1, message persistence: all messages are stored persistently on the server side to ensure that they will not be lost due to server failure.

2, synchronized dual-write: synchronized dual-write messages in the master and backup Broker to improve the reliability of the data.

3、Acknowledgement mechanism: After consuming a message, the consumer needs to send an acknowledgement to the Broker, and unconfirmed messages will be re-delivered.

4, transaction message support: provide transaction message mechanism to ensure that the local transaction and message sending atomicity.

3.7 Differences between rocketMQ and kafka components

Applicable Scenarios

Kafka
Originally developed by LinkedIn, Kafka is primarily used to process large-scale log data and real-time data streams. It is suitable for the following scenarios:

Log Collection: Kafka can efficiently collect, store and process large-scale log data.
Real-time data stream processing: Kafka can be used in conjunction with Apache Storm, Apache Flink and other real-time processing frameworks to achieve real-time data stream processing and analysis.
Data pipeline: Kafka can be used as a data pipeline between different systems for data transfer and synchronization.

RocketMQ
RocketMQ is developed by Alibaba and is mainly used to solve the problem of message passing in distributed systems. It is suitable for the following scenarios:

Distributed Transaction Processing: RocketMQ supports distributed transactions, suitable for dealing with business scenarios that require consistency assurance.
Message Push: RocketMQ supports multiple message push modes, including cluster consumption and broadcast consumption, suitable for scenarios that require pushing messages to multiple consumers.
Delayed Message Processing: RocketMQ supports delayed messages, suitable for business scenarios that require delayed processing.

architectural design
Kafka
Kafka uses a publish-subscribe model and consists of three main components: the Producer, the Consumer, and the Broker.

Broker: Kafka's Broker is a centerless design that can be scaled horizontally. Multiple Brokers form a Kafka cluster , sharing the task of data storage and processing .
Topic: The data in Kafka is divided into Topics, and each Topic can have multiple Partitions.Partitions are the key to achieving distributed storage and parallel processing in Kafka.
Producer and Consumer: The producer is responsible for sending messages to Kafka and the consumer is responsible for receiving messages from Kafka and processing them. Producers and consumers are coordinated and managed through Zookeeper.

RocketMQ
RocketMQ uses a master-slave architecture and consists of four main components: Producer, Consumer, NameServer, and Broker.

Broker: RocketMQ's brokers are divided into master and slave brokers, and data consistency is achieved by synchronous replication between master and slave. Multiple Brokers form a RocketMQ cluster and share the task of storing and processing data.
Topic: The data in RocketMQ is also divided into Topics, and each Topic can have multiple Queues. Queues are the key to achieving distributed storage and parallel processing in RocketMQ.
Producer and Consumer: The producer is responsible for sending messages to RocketMQ and the consumer is responsible for receiving messages from RocketMQ and processing them. Producers and consumers are coordinated and managed through NameServer.

performances
Kafka
Kafka has performance advantages in the following areas:

Data throughput: Kafka has extremely high data throughput, which can reach millions of messages per second.
Latency: Kafka has low latency and can deliver and process messages in milliseconds.
Data types: Kafka supports a variety of data types, including text, binary data and JSON, which can meet the needs of different scenarios.

RocketMQ
RocketMQ also excels in performance in several ways:

Data Throughput: RocketMQ's data throughput is also very high, reaching hundreds of thousands of messages per second.
Latency: RocketMQ also has low latency, allowing messages to be transmitted and processed in milliseconds.
Data Types: RocketMQ also supports a variety of data types, including text, binary data, and JSON.

dependability
Kafka
Kafka takes a variety of measures when it comes to reliability:

Data Backup: Kafka has multiple copies of each Partition, which can prevent data loss to some extent.
Master-slave mechanism: Kafka Broker between the master-slave replication mechanism to ensure data consistency.
High availability: Kafka's Broker can be scaled horizontally, and the availability of the system can be improved by increasing the number of Brokers. In addition, Zookeeper as a coordination center also improves the reliability of the system.
RocketMQ
RocketMQ also takes a variety of measures in terms of reliability:

Data Backup: RocketMQ has multiple copies of each Queue, which can prevent data loss to some extent.
High Availability: Synchronized replication mechanism is used between master and slave Broker to ensure data consistency. When the master Broker fails, the slave Broker can automatically switch to the master Broker to ensure system availability. In addition, the availability of the system can be improved by increasing the number of Brokers.NameServer as the coordination center also improves the reliability of the system. When a NameServer fails, other NameServers can take over its tasks to ensure the normal operation of the system. Producers and consumers automatically select available Brokers and NameServers for operation when sending and receiving messages, which improves the reliability of the system. Producers and consumers automatically select available Brokers and NameServers to operate when sending and receiving messages to avoid the risk of a single point of failure. RocketMQ also supports transactional messages to ensure consistency of distributed transactions.

(statistics) correlation

4.1 What is MQTT?

MQTT (Message Queuing Telemetry Transport) is a lightweight, publish/subscribe model-based communication protocol designed in 1999 by Andy Stanford-Clark of IBM and Arlen Nipper of Cirrus Link. It is designed for use in limited bandwidth and unstable network environments and is well suited for communication between Internet of Things (IoT) devices.
Key features of the MQTT protocol include:

Lightweight: MQTT's message headers are small, which reduces the device's bandwidth and power consumption, making it suitable for use on embedded devices.
Publish/Subscribe Mode: Allows a device to receive messages by publishing messages to a topic while multiple devices can subscribe to that topic.
Reliability: Three different Quality of Service (QoS) levels are supported to ensure reliable delivery of messages.
Asynchronous communication: the publisher and subscriber do not need to be online at the same time, the message will be stored on the server until the subscriber successfully receives it.
Simple and easy to use: simple protocols, easy to implement, support for multiple programming languages.
MQTT is widely used in many fields such as IoT, mobile communications, smart home, smart grid, remote monitoring and automotive.

4.2 How is it different from HTTP?

MQTT (Message Queuing Telemetry Transport) protocol and HTTP (Hypertext Transfer Protocol) protocol are both network protocols, but they have significant differences in design concepts, usage scenarios and performance characteristics:

Communication Mode:
MQTT is based on a publish/subscribe model that allows one device (the publisher) to send messages to topics, and multiple devices (the subscribers) can subscribe to those topics to receive messages.
HTTP is based on a request/response model where the client sends a request to the server and the server returns a response.
Uses and scenarios:
MQTT is designed for the Internet of Things (IoT) for bandwidth-limited, latency-sensitive, and unstable network environments such as mobile communications, embedded devices, and more.
HTTP is mainly used for web browsing, file transfer and API services, and is suitable for stable, high-bandwidth network environments.
Message size and overhead:
The MQTT protocol header is small and has low message overhead, making it suitable for transmitting small messages.
The HTTP protocol header is relatively large and is suitable for transferring larger data, such as files and images.
Quality of Service (QoS):
MQTT supports three different quality of service levels and you can choose different levels of messaging reliability as needed.
HTTP has no built-in quality of service level for messages, but reliability can be improved by implementing specific error handling and retry mechanisms.
Connection modeling:
MQTT maintains a long connection, where the client maintains a persistent session with the server until the client explicitly disconnects.
HTTP uses short connections, and the connection is usually closed after each request is completed. Although HTTP/1.1 supports persistent connections (keep-alive), the connection may be idle after each request/response interaction.
Real-time:
MQTT is designed for real-time communication with low latency for applications that require immediate response.
HTTP is not as real-time as MQTT, and although there are protocols such as WebSocket that provide real-time communication capabilities, HTTP itself is better suited for non-real-time applications.
Consumption of resources:
MQTT requires less device resources due to its lightweight design.
HTTP usually requires more resources, especially on mobile or embedded devices.
Overall, MQTT and HTTP have their own advantages and are suitable for different application scenarios. When choosing the right protocol, you need to decide based on the needs of the application, the network environment and device capabilities.

4.3 What are the advantages and disadvantages of the MQTT protocol?

The advantages and disadvantages of the MQTT protocol are listed below:
Pros:
1. Lightweight: The MQTT protocol has a very simple design with small message headers, which makes it ideal for use on devices with limited bandwidth and limited computing power, such as IoT devices.
2. Low latency: MQTT protocol supports real-time communication with low latency, which is suitable for application scenarios that require fast response.
3. Support for multiple quality of service (QoS) levels: MQTT provides three different levels of messaging reliability, you can choose the appropriate QoS level according to the needs of the application.
4. Efficient publish/subscribe model: MQTT uses a publish/subscribe model that allows a message to be received by multiple subscribers , which reduces network traffic and device resource consumption .
5. Good network adaptability: The MQTT protocol can work in unstable or intermittent network environments, such as mobile networks or satellite communications.
6. Support for persistent sessions: MQTT clients can establish a persistent session with the server , even if the client disconnects , the server will save its subscription information and unreceived messages until the client reconnects .
Drawbacks:
1. Security issues: Although MQTT supports encryption and authentication, it may not be secure enough by default. It is necessary to use TLS/SSL or other security measures at the transport layer to enhance security.
2. Unsuitable for large data transfers: The MQTT protocol is not suitable for transferring large amounts of data because it is designed for small, frequent messaging.
3. Dependence on a centralized broker: MQTT uses a centralized broker to handle message publishing and subscription, which can lead to single points of failure and performance bottlenecks.
4. Implementation complexity: Despite the simplicity of the MQTT protocol itself, the implementation of the client and server side on some platforms may require more development work.
5. Limited client support: While MQTT is widely used in IoT devices, HTTP is probably the more common communication protocol in traditional desktop or mobile applications.
When choosing to use the MQTT protocol, the advantages and disadvantages need to be weighed against the specific needs of the application, the network environment, and the capabilities of the device.

4.4 What are the message quality (Qos) levels in MQTT and explain what they mean?

The MQTT protocol defines three message quality levels (Quality of Service, QoS) that are used to specify the reliability of message delivery. These QoS levels are as follows:

QoS 0 - At Most Once:
This is the lowest QoS level where messages are delivered at most once.
The publisher sends the message to the proxy (Broker) and does not wait for any acknowledgement.
If the subscriber is online and able to receive messages, it will receive them; otherwise, messages may be lost.
This level applies to scenarios where message loss can be tolerated, such as real-time surveillance or sensor data, where the most recent data is more important than older data.
QoS 1 - At Least Once (At Least Once):
At this level, messages are delivered at least once.
After the publisher sends a message, it waits for an acknowledgement (PUBACK) from the agent.
If the agent does not receive an acknowledgement, it tries to resend the message.
The subscriber needs to send an acknowledgement (PUBACK) to the proxy after receiving the message.
This level applies to scenarios where message loss cannot be tolerated, but duplicate delivery can be tolerated.
QoS 2 - Exactly Once (Exactly Once):
This is the highest level of QoS, ensuring that messages are delivered exactly once.
The publisher sends the message and waits for an acknowledgement from the agent (PUBREC).
Upon receiving the message, the agent sends a PUBREC to the publisher and waits for a response from the publisher (PUBREL).
The publisher receives the PUBREC and sends the PUBREL to the agent.
The agent receives the PUBREL and sends the final confirmation (PUBCOMP) to the publisher.
The subscriber also needs to send an acknowledgement (PUBACK) to the proxy after receiving the message.
This level applies to critical data that needs to ensure that messages are not lost or passed repeatedly, such as billing systems or financial transactions.
Choosing the right QoS level depends on the specific needs of the application and the network environment. For example, in a stable network environment, QoS 0 may be sufficient because it has the lowest latency and overhead. Whereas in an unstable network environment, or when message reliability is important, QoS 1 or QoS 2 may be a better choice.

4.5 Can you give some application scenarios using MQTT?

The MQTT (Message Queuing Telemetry Transport) protocol is widely used in a variety of Internet of Things (IoT) and mobile application scenarios due to its lightweight, low-latency and reliability. The following are some typical MQTT application scenarios:

IoT device communication:
MQTT is widely used for communication between IoT devices, such as sensor networks, smart home devices, and industrial control systems.
It allows these devices to send and receive messages with very low bandwidth and power consumption.
Mobile Apps:
MQTT can be used for pushing messages from mobile applications, e.g., instant messages, notifications, weather updates, and so on.
It supports disconnected devices to receive offline messages when reconnected.
Smart Cities:
In smart city applications, MQTT is used to connect various devices and systems, such as traffic signals, surveillance cameras, and environmental monitoring stations.
It helps to enable real-time data collection and intelligent decision support.
Remote monitoring and management:
MQTT is used for remote monitoring and management systems, such as solar power stations, wind turbines, and telemedicine equipment.
It allows real-time monitoring of device status and remote control of the device.
Vehicles and transportation:
In Vehicular Networking (V2X) communication, MQTT can be used for communication between vehicles and between vehicles and infrastructure.
It supports real-time traffic information exchange, vehicle diagnostic data transmission, and so on.
Agriculture:
MQTT is used in precision agriculture to connect farm sensors, irrigation systems, drones, etc. to monitor crop growth environment and optimize farm management.
Financial services:
MQTT can be used for financial services applications such as financial trading systems, stock market data push, and real-time exchange rate updates.
Energy management:
In smart grid and energy management systems, MQTT is used to monitor energy consumption, control smart meters and grid devices.
Retail and logistics:
MQTT can be used in the retail and logistics industry for inventory management, logistics tracking, automated warehouse systems and more.
Smart Home:
MQTT is used for control and monitoring in smart home devices such as lights, thermostats, security systems, etc.
These application scenarios demonstrate the versatility and flexibility of the MQTT protocol, making it ideal for the IoT and mobile Internet space.