This article was simultaneously published on the public account: Mobile Development Things:Android Stability (2): Governance Ideas
Generally speakingAndroid
Stability includescrash
andANR
, this article mainly focuses oncrash
(appliedcrash
rate) to tell how to do itAndroid
stability-related work. Before talking about specific ideas, let’s first understandAndroid
Exception catching mechanism
1 Exception catching mechanism
Android
The exception catching mechanism in can be divided from the language level intojava
layer andnative(C++)
layer.
1.1 java exception catching mechanism
1.1.1 Basics
Throwable
It is the base class of all exceptions and has two important subclasses:
-
Error
: Serious system errors, such asOOM
, general applications have no way to process it -
Exception
:Exceptions that can be caught and handled by the application, such asNPE
Generally what we deal with in the code isException
Related exceptions, and these exceptions are divided into two categories according to whether they need to be processed during the compilation stage:
- Checked exception: It needs to be handled during the compilation stage, otherwise the code will not pass the compilation (usually passes
try-catch
Or used in method signaturethrows
statement will throw this exception), such as during file operationsIOException
; - Unchecked exceptions: Exceptions that are not required to be processed during the compilation phase but will occur at runtime, including runtime exceptions
RuntimeException
and its subcategories
1.1.2 Use
In addition to the commonly used ones in the code, throughtry-catch
In addition to wrapping the problematic code blocks that may cause exceptions, you can also useCapture application-wide exceptions. For example in
Application
This method is set in the class. When an unhandled exception occurs, the exception information can be recorded to facilitate subsequent analysis. This is usually achieved through custom classesUncaughtExceptionHandler
Interface to implement global exception handling:
class ACrash implements {
public UncaughtExceptionHandler exceptionHandler;
@Override
public void uncaughtException(@NonNull Thread t, @NonNull Throwable e) {
// Customized processing is done here based on the exception type and thread.
// After processing the custom logic, determine whether to continue the exception to the original exception handler
if (exceptionHandler != null) {
(t, e)
}
}
}
When setting up a custom exception handling interface, one thing to note is that if you use a third-partycrash
collection system, likebugly
,acrc
, when setting the exception handler, you need to pay attention to whether it has been set:
UncaughtExceptionHandler tmpHandler = ()
ACrash aHandle = ACrash()
// To retain the original exception handling
= tmpHandler
(aHandle)
1.2 native exception catching mechanism
1.2.1 Basics
native
Layer's exception catching mechanism, in addition to similartry-catch
andthrow
In addition to throwing exceptions, there is also a system-level signal distribution and processing mechanism. The system will notify exception information by distributing signals, so exception processing is a signal processing.
Generally passablesigaction
function to register information processing functions:
static void signalHandler(int signal, siginfo_t *info, void *reserved) {
// handle signal
}
void initSignalHandler() {
struct sigaction action;
action.sa_flags = SA_SIGINFO;
action.sa_sigaction = signalHandler;
sigaction(SIGSEGV, &action, NULL); // Capture segfault
}
Common semaphores:
- SIGSEGV 11 Invalid memory reference
- SIGABRT 6 Exit command issued by abort
- SIGFPE 8 C floating point exception
- SIGILL 4 illegal command
- SIGBUS 10, 7, bus error (memory error)
- SIGKILL 9 kill signal
1.2.2 Use
native
After the exception is caught, it also involvesminidump
Obtaining files and restoring the entire stack are relatively complicated, so generally we do not directly register signal monitoring ourselves to handle it, but use third-party solutions, such asbugly
, or usebreakpad
library to handle (bugly
The bottom layer is also usedbreakpad
),breakpad
For usage, please refer to:How to use Google Breakpad on Android platform
2 Classification and governance ideas
2.1 Classification
In addition to regular business code optimization (like regularnpe
,indexoutofboundsexception
etc.) From the perspective of the operating system, stability optimization issues can be roughly divided into the following categories:
- Memory stability optimization
- Thread stability optimization
- System problem optimization
2.2 Governance ideas
Stability governance actually ultimately serves users, so the entire governance is also centered aroundImprove user experienceto expand. The main ideas of governance are:
- If it can be repaired, try to repair it (like npe, oom);
- If there is no way to fix it from a business perspective (like system bugs), try to minimize the impact on users and downgrade some exceptions.
Improving user experience is nothing more than reducing applicationcrash
Rate, here is a core principle:The main energy should be spent onTop 10
,Top20
problem analysis, solve the main problems and solve some long-tail problems by the way;
2.2.1 Memory management
The management of memory issues mainly focuses on:Reduce running memory usage as much as possible while avoiding memory leaks, memory overflow problem. Here we need to use some tools to help us determine what the possible problem is:
- leakcanary: Square’s open source tool for detecting and diagnosing memory leaks in Android applications, used in the development process;
- KOOM: Kuaishou’s online memory monitoring solution can help better optimize application memory
- Profiler: Android’s own performance monitoring tool that can assist in memory analysis
(There are also other tools for memory analysis in the industry. You can choose the appropriate one as long as it can solve the problem)
For some ways to optimize memory, please refer to the previous articleAndroid Stability (1): Memory Usage Guide
2.2.2 Thread management
Thread management mainly focuses on:
- Reuse threads and use thread pools for scheduling as much as possible (different businesses will use different thread pool strategies);
- Thread recycling. After the thread is finished using, shutdown must be called in time. If there are variables holding the thread, they must be cleared in time.
Here, the author has tried several optimization directions in my business:
- limit
OkHttpClient
The maximum number of threads to avoid unlimited growth and reuse the same thread as much as possibleOKHttpClient
- Convergence threads provide several methods to obtain threads from the thread pool to avoid business direct
new
- Thread pool initialized
core
Threads perform differentiated initialization according to different businesses; - Prevent the thread instance from being held by a singleton, resulting in no way to release resources after the thread is closed;
2.2.3 System problem management
becauseAndroid
Version fragmentation problem, you will encounter various problems that only collect the system stackcrash
The problem cannot be solved from the business level. Here we can only analyze the specific problems in detail. For system problems, there is a big governance (analysis) idea:Cluster the system versions in which the problem occurs to determine whether it is a problem with a specific version or a general problem., after making a guess, then analyze the source code of the corresponding version to verify the guess (hypothesis -> determine the problem -> solve the problem)
Available to view onlineAndroid
Source code address:Android Code Search
After identifying the problem, when thinking about how to solve the problem (system problems are highly likely to be impossible to cure and can only minimize the impact on users), here is a big framework:
- can pass
hook
The system interface is processed throughhook
System interface to handle (generally need to be inC++
layerhook
),like:- Expand the limits of the system, e.g.
Android 8.1
The system's file descriptor limit is 1024, which can be passedhook
The interface is expanded to4096
, can be reduced due tofd
Problems caused by overflow; - Reduce the level impact of the system, such as
RenderThread
The question is caused byabart
The crash is reduced to frame loss;
- Expand the limits of the system, e.g.
- no way to pass
hook
The system handles:- Exceptions that do not affect the user, such as
DeadSystemException
,FinalizerWatchdogDaemonTimeout
This category can be directly implemented in the business layercatch
Live (please refer to the previous exception catching mechanism) - Exceptions that affect user experience should still be removed
crash
logic
- Exceptions that do not affect the user, such as
3 Summary
This article focuses onAndroid
appliedcrash
rate elaboratedAndroid
For stability-related work, we first introduce the exception capture mechanism, and then propose ideas for classification and governance, aiming to reduce application crash rates and improve user experience.
Through the reasonable setting and optimization strategy of the exception capture mechanism, focusing on the Top10 and Top20 problems of the application, and classifying and managing the problems, the stability of the application can be effectively improved.