Understand the measurement criteria of architecture design in one article: functionality, usability, performance, scalability, security, collaboration efficiency, complexity, cost-effectiveness

Hello everyone, I am Master Tang~

The primary goal of architectural design is to serve business needs. Therefore, we should not blindly pursue the so-called "most powerful" architecture, but should be committed to finding the architectural solution that best suits the current business environment and future development needs.

Measuring the soundness of an architecture is a complex process that requires a comprehensive assessment from multiple perspectives. It can be mainly analyzed from the following perspectives:

Functional requirements perspective: Evaluate whether the architecture effectively supports current business needs and has sufficient flexibility to adapt to future business development.
Non-functional requirements perspective: Evaluate key technical indicators such as system availability, performance, scalability and security.
Team collaboration perspective: Evaluate whether the architecture can effectively promote team collaboration and improve development efficiency, including complexity management and team collaboration efficiency.
Cost-benefit perspective: Evaluate the balance between technical investment and business value of architectural solutions, including costs such as development, operation and maintenance, hardware and technical debt.

Functional

1. Solve existing business problems

The architectural design must be able to effectively solve existing business pain points while taking into account various scenarios of daily operations.

Taking the order management business as an example, the system needs to handle multiple operations such as order creation, cancellation, refund and return. When the architectural design can provide complete functional support for these operations and achieve clear division of responsibilities at the interface and module levels, it shows that it has good business coverage capabilities.

In addition, if the architecture can smoothly respond to unconventional needs such as limited-time promotions and event flash sales, it further proves the maturity of its design.

2. Complete business needs efficiently

An excellent architecture should make functional development and iteration efficient, rather than relying on "patching" to achieve it.

In an ideal world, when the business side raises new requirements, the technical team can quickly locate the relevant modules and expand them without making large-scale modifications to the existing code. If the microservice or plug-in architecture is properly planned, new functions only need to be implemented in specific services or plug-ins without affecting the entire system, which not only saves time but also reduces risks.

3. Forward-looking design

Business needs are always changing and evolving. If the architecture needs to be reconstructed every time a function is expanded, it will not only waste resources but also increase operational risks. Therefore, the key to evaluating the quality of an architectural design is to observe its resilience to future changes.

A good architecture should consider the process design of version iteration and module replacement at the early stage to ensure that business logic and modules can be upgraded independently to avoid the situation of "one hair affecting the whole body". When the architecture can grow steadily with the evolution of the business, it means that it is forward-looking at the functional level.

Availability

Availability refers to the ability of a system to operate normally for a long time without interruption.

Availability is often measured in "several nines"; for example, "four nines" (99.99%) availability means that the system is unavailable for no more than 53.6 minutes in a year. In distributed systems, high availability is usually achieved through the following aspects:

Redundancy: Reduce risk by adding machines and sharing pressure. For example, the same service is deployed on multiple servers, or the same data is stored on multiple servers to back up each other.
Self-recovery: The system can quickly recover when problems occur without affecting business availability. This includes strategies such as timeout handling, retry mechanisms, rollback, and data recovery.
Current limiting: Prevent the system from being overwhelmed by too many requests by controlling the system's access volume and flow rate.
Downgrade: When the system pressure is too high, some non-core functions are temporarily shut down to ensure the normal operation of core functions.

performance

Performance mainly includes key indicators such as response time, throughput and resource utilization. The system needs to maintain stable response during peak business periods and avoid obvious lags when the amount of data and requests increases.

There are many ways to improve performance, including using caching, asynchronous communication, and efficient load balancing strategies. But it should be noted that performance optimization often increases costs. Excessive pursuit of performance can lead to higher hardware and maintenance costs, so a balance between performance and cost needs to be found.

To evaluate system performance, focus mainly on the following indicators:

Average response time (ART): The average time from initiating a request to receiving a response.
Throughput (TPS or QPS): The number of requests that can be processed per second.
Resource utilization: usage of CPU, memory, disk I/O, network bandwidth, etc.
95/99 quantile response time: measures the time-consuming distribution of most requests and is used to identify long-tail latency issues.

Scalability

Scalability refers to the ability of a system to easily adapt to future demand growth and business expansion without the need for major changes to the system architecture.

Specifically, improving scalability includes the following aspects:

Architecture design: Use appropriate design patterns and architecture designs (such as DDD layering, microservices, etc.) to allow the system to flexibly add and expand functions.
Performance: The system can cope with business growth by increasing hardware resources without significantly modifying the code.
In terms of data processing: it can handle the rapidly growing amount of data through technologies such as sub-database and sub-table.

security

Security is an indispensable key indicator in the system. It encompasses the confidentiality, integrity and availability of data and requires protection from external attacks, internal misuse and data leakage.

The security system covers all aspects from network layer defense, application layer authentication and encryption, to data layer audit and backup. Common security measures include:

Permission management: Set appropriate permissions for different roles to prevent unauthorized access.
Data encryption: Implement encryption strategies at both the transmission and storage levels.
Firewall and security group: Limit the open port range and reduce system exposure.
Vulnerability scanning and penetration testing: proactively discover system security risks.

When evaluating system security, in addition to compliance audits, you also need to focus on the system's defense and recovery capabilities. Key inspection items include:

Whether it can effectively defend against common attacks such as SQL injection and XSS.
Whether security patches are updated in a timely manner.
Whether a complete log audit and alarm mechanism has been established.

Teamwork efficiency

The architectural design ultimately needs to be implemented within the team. A complex architecture will be difficult to maintain without appropriate team support. On the contrary, a team without architectural constraints may work independently, leading to system fragmentation and technology stack confusion.

Therefore, teamwork and shared responsibilities are important dimensions in measuring architecture. When the architecture clearly divides different business areas into corresponding groups, each group only needs to focus on its own business context, and collaboration will be smoother. However, if the boundaries between modules are blurred and team members modify other people's code at will, communication costs and conflict risks will increase.

Average cycle from demand to production: Is it severely delayed due to cross-team coordination?
Cross-department communication costs: whether there are a large number of repeated meetings, requirements translation or interface docking issues.
Fault attribution and handling mechanism: After a fault occurs, can the responsible team be quickly found and repaired?

Excellent architectural design must combine the size of the team, organizational culture and goal positioning. Large companies may be suitable for the "module by business line" approach, while small teams may be more suitable for the "full stack development" approach. There is no one-size-fits-all best practice, and the architecture that works best for your team is the best architecture.

complexity

As the business continues to expand, the complexity of the system architecture will rise sharply. If not controlled, the system will become difficult to understand and maintain, ultimately reducing development efficiency and innovation capabilities. The core of complexity management is to maintain the understandability and maintainability of the system during its continuous evolution.

The most common sources of complexity include:

Stacking of functions: Continuously adding new functions, but failing to sort out and optimize existing functions and public logic in a timely manner.
Excessive or unreasonable splitting of modules: too fine division of microservices or modules leads to complicated communication, or modules are too large and difficult to maintain.
Chaos of the technology stack: Each team introduces diverse languages, frameworks and tools at different times, and there is a lack of unified standards.

To measure complexity management, you can observe:

Clarity of module dependency graph: Whether the calling relationship between services is clearly visible.
How consistent the documentation is with the current situation: Outdated documentation that has not been updated will increase system complexity.
Consistency of the team's understanding of the business and architecture: Are there "black box modules" that only a few people understand.

To reduce complexity, you can start from the following aspects:

Macro level: clearly divide business domains and sub-domains so that teams can maintain stable business boundaries.
Meso-level: Unify technology stack and framework selection to avoid reinventing the wheel.
Micro level: Regularly refactor old modules, extract common components, and clean up obsolete code.

Cost effective

Cost-effectiveness refers to whether the system's input and output are balanced while meeting core indicators and business needs.

It involves hardware resources, cloud service overhead, human operation and maintenance costs, and potential expenses caused by technical debt.

If you pursue high availability and high performance too much, but your budget or team resources are limited, you may lose more than you gain.

Some teams prematurely built "highly available, high-performance" complex systems, but the business scale did not match, resulting in an imbalance in the input-output ratio.

Other teams only realized the insufficient architecture performance after rapid business expansion and had to respond hastily, which ultimately led to frequent online accidents and a lot of overtime. For architects, accurately grasping the system scale and evolution timing is an important test.

This article has been included in my technical website:It contains detailed explanations of AI programming, algorithm Leetcode, interview eight-part essays, BAT interview questions, resume templates, architecture design, and other experience sharing.