Android ANR input 机制导读

Android 工程师应该都遇到过应用无响应(ANR,Application Not Responding)问题,当应用程序一段时间无法及时响应,则会弹出ANR对话框,让用户选择继续等待,还是强制关闭。

前段时间做预装厂商反馈了好些ANR,借此机会再次复习下ANR 相关的知识。App 稳定性监控ANR 也占了大头,对ANR 流程的深刻理解也有助于做监控。希望这些Input ANR 知识对你能有所帮助。

ANR 信息

发生ANR 以后会有3个主要的信息帮助开发者修复问题:Log 日志可以查看ANR 的原因,Trace 日志可以找到具体耗时Trace 路径,弹窗则提示用户发生了ANR。

Log 日志

1
2
3
4
5
6
7
8
9
ANR in com.bugsnag.android.example (com.bugsnag.android.example/.ExampleActivity)
PID: 5353
Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 7. Wait queue head age: 5518.6ms.)
Load: 0.19 / 0.13 / 0.05
CPU usage from 0ms to 9178ms later (2019-09-28 20:29:00.987 to 2019-09-28 20:29:10.164):
5.9% 5353/com.bugsnag.android.example: 3.1% user + 2.8% kernel / faults: 840 minor
3.7% 1783/system_server: 1.1% user + 2.5% kernel / faults: 471 minor
.....
7.5% TOTAL: 2.8% user + 4.7% kernel

Trace 日志

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
----- pid 1783 at 2019-09-28 18:04:06 -----
Cmd line: system_server
Build fingerprint: 'google/sdk_gphone_x86/generic_x86:8.1.0/OSM1.180201.031/5455776:userdebug/dev-keys'
ABI: 'x86'
Build type: optimized
### Classloader Info
Zygote loaded classes=5300 post zygote classes=3590
### Intern Table
Intern table: 47863 strong; 1329 weak
### JavaVM
JNI: CheckJNI is on; globals=2309 (plus 58 weak)
Libraries: /system/lib/libandroid.so /system/lib/libandroid_servers.so /system/lib/libcompiler_rt.so /system/lib/libjavacrypto.so /system/lib/libjnigraphics.so /system/lib/libmedia_jni.so /system/lib/libsoundpool.so /system/lib/libwebviewchromium_loader.so /system/lib/libwifi-service.so libjavacore.so libopenjdk.so (11)
### Heap
Heap: 1% free, 14MB/15MB; 252408 objects
Dumping cumulative Gc timings
...
### oat
/system/priv-app/Telecom/oat/x86/Telecom.odex: speed
/system/framework/oat/x86/services.odex: speed
/system/framework/oat/x86/ethernet-service.odex: speed
/system/framework/oat/x86/wifi-service.odex: speed
/system/framework/oat/x86/com.android.location.provider.odex: speed
/system/priv-app/SettingsProvider/oat/x86/SettingsProvider.odex: speed
/system/priv-app/FusedLocation/oat/x86/FusedLocation.odex: speed
/system/framework/oat/x86/com.android.location.provider.odex: speed
Running non JIT
### ThreadList:分为了Thread 和Unattach Thread
suspend all histogram: Sum: 2.261ms 99% C.I. 0.133us-782.400us Avg: 75.366us Max: 811us
DALVIK THREADS (102):
"Signal Catcher" daemon prio=5 tid=2 Runnable
| group="system" sCount=0 dsCount=0 flags=0 obj=0x12c80000 self=0xe4488000
| sysTid=1788 nice=0 cgrp=default sched=0/0 handle=0xe4750970
| state=R schedstat=( 6939261 18404 8 ) utm=0 stm=0 core=2 HZ=100
| stack=0xe4656000-0xe4658000 stackSize=1006KB
| held mutexes= "mutator lock"(shared held)
native: #00 pc 0047eae1 /system/lib/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, int, BacktraceMap*, char const*, art::ArtMethod*, void*)+209)
.....
"android.anim" prio=5 (not attached)
| sysTid=5358 nice=-10 cgrp=default
| state=S schedstat=( 860788 0 3 ) utm=0 stm=0 core=2 HZ=100
kernel: futex_wait_queue_me+0xdf/0x130
kernel: futex_wait+0x113/0x22e
kernel: do_futex+0xbe/0x825
kernel: compat_SyS_futex+0xb7/0x129
kernel: sysenter_dispatch+0x7/0x1a
----- end 1783 -----

弹窗

Input 机制相关流程

主要代码:

1
2
3
/frameworks/native/services/inputflinger/InputDispatcher.cpp
/frameworks/native/services/inputflinger/InputReader.cpp
/frameworks/native/services/inputflinger/EventHub.cpp

InputDispatcher(InputDispatcherThread)线程负责将输入事件分发到目标窗口,其中用到了几个重要的事件队列:

  • mInBoundQueue 用于记录InputReader 发送过来的输入事件
  • outBoundQueue 用于记录即将分发给目标应用窗口的输入事件
  • waitQueue 用于记录已分发给目标应用,且应用尚未处理完成的输入事件
  • mCommandQueue 用于记录一些特殊事件(比如这里面要讲解的ANR)

下面是mInBoundQueue、outBoundQueue、waitQueue、mCommandQueue 对应的关系。
事件队列对应关系

这里要讲的ANR 和事件分发流程起始点都是InputDispatcherThread::threadLoop

Input 事件分发流程

Input 事件分发流程

代码:

1
/frameworks/native/libs/input/InputTransport.cpp(InputPublisher 和InputChannel)

这里是一个完整的按键事件流程(简易版本),主要涉及到的流程:

  1. mInBoundQueue 队列中获取输入事件(dispatchOnceInnerLocked)
  2. 按键事件分发(dispatchKeyLocked)
  3. 查找事件对应的目标窗口(findFocusedWindowTargetsLocked)
  4. enqueueDispatchEntriesLocked 将事件加入到outBoundQueue 队列中
  5. startDispatchCycleLocked 分发事件,如果分发失败会将事件从outBoundQueue 中移除然后加入到waitQueue 队列
  6. InputChannel 部分后面以后可以专门详细分析
  7. 事件分发成功以后会重置ANR 以及mPendingEvent 等(releasePendingEventLocked)

Java 层的流程:
Input 事件分发Java 流程

Input ANR 流程

Input ANR 流程
Input ANR 产生的流程如果不仔细研究下,可能会存在一些误解。分析下来可将ANR 分成3个步骤:

  1. 步骤1(上图2~8),checkWindowReadyForMoreInputLocked检查可能出现ANR(具体原因可以查看下方:ANR 原因),handleTargetsNotReadyLocked计算出发生ANR 的超时时间
  2. 步骤2(上图9~16),handleTargetsNotReadyLocked判断是否发生了ANR(当前事件时间距离上次事件时间超过5s,不同的ROM 可能修改这个时间的,具体时间间隔是下一次dispatchOnceInnerLocked距离上一次resetANRTimeoutsLocked时间间隔超过5s),最后调用onANRLocked记录ANR 事件
  3. 步骤3(上图17~20),runCommandsLockedInterruptible最后执行ANR 相关事件

具体3个步骤开始入口可以查看下方代码InputDispatcher::dispatchOnce()

1
2
3
4
5
6
7
8
9
10
11
12
13
14
void InputDispatcher::dispatchOnce() {
...
{
...
if (!haveCommandsLocked()) {
dispatchOnceInnerLocked(&nextWakeupTime); // 第1,2步骤执行
}
...
if (runCommandsLockedInterruptible()) { // 第3 步骤执行
nextWakeupTime = LONG_LONG_MIN;
}
}
...
}

ANR 计算关键代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
void InputDispatcher::dispatchOnceInnerLocked(nsecs_t* nextWakeupTime) {
nsecs_t currentTime = now();
...
// Ready to start a new event.
// If we don't already have a pending event, go grab one.
if (! mPendingEvent) {
...
// Get ready to dispatch the event.
resetANRTimeoutsLocked();
}
...
switch (mPendingEvent->type) {
...
case EventEntry::TYPE_KEY: {
KeyEntry* typedEntry = static_cast<KeyEntry*>(mPendingEvent);
...
done = dispatchKeyLocked(currentTime, typedEntry, &dropReason, nextWakeupTime);
break;
}
...
}
...
}
  • currentTime表示当前事件开始的时间
  • 如果当前还有未完成的事件(mPendingEvent不为null)则会通过resetANRTimeoutsLocked重置ANR
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
int32_t InputDispatcher::handleTargetsNotReadyLocked(nsecs_t currentTime,
const EventEntry* entry,
const sp<InputApplicationHandle>& applicationHandle,
const sp<InputWindowHandle>& windowHandle,
nsecs_t* nextWakeupTime, const char* reason) {
if (applicationHandle == NULL && windowHandle == NULL) {
...
} else {
if (mInputTargetWaitCause != INPUT_TARGET_WAIT_CAUSE_APPLICATION_NOT_READY) {
...
nsecs_t timeout;
...
mInputTargetWaitCause = INPUT_TARGET_WAIT_CAUSE_APPLICATION_NOT_READY;
mInputTargetWaitStartTime = currentTime;
mInputTargetWaitTimeoutTime = currentTime + timeout;
mInputTargetWaitTimeoutExpired = false;
mInputTargetWaitApplicationHandle.clear();
if (windowHandle != NULL) {
mInputTargetWaitApplicationHandle = windowHandle->inputApplicationHandle;
}
if (mInputTargetWaitApplicationHandle == NULL && applicationHandle != NULL) {
mInputTargetWaitApplicationHandle = applicationHandle;
}
}
}
if (mInputTargetWaitTimeoutExpired) {
return INPUT_EVENT_INJECTION_TIMED_OUT;
}
if (currentTime >= mInputTargetWaitTimeoutTime) {
onANRLocked(currentTime, applicationHandle, windowHandle,
entry->eventTime, mInputTargetWaitStartTime, reason);
...
*nextWakeupTime = LONG_LONG_MIN;
return INPUT_EVENT_INJECTION_PENDING;
} else {
...
return INPUT_EVENT_INJECTION_PENDING;
}
}
  • mInputTargetWaitTimeoutTime是计算出发生ANR 的时间。具体计算的值是步骤1(上图2~8)分发事件时间 + timeout(此处timeout为5s)。由于是第1次执行handleTargetsNotReadyLocked所以currentTime < mInputTargetWaitTimeoutTime会直接退出该方法
  • 步骤2(上图9~16)也会执行到handleTargetsNotReadyLockedcurrentTime >= mInputTargetWaitTimeoutTime成立会继续执行onANRLocked记录ANR 信息

Input ANR 处理流程

Input ANR 处理流程

system_process进程处理ANR,流程中也有一些关键需要注意:

  • 步骤2,InputMonitor#notifyANR会计算发生ANR 的进程
  • 步骤5,AppErrors#appNotResponding会打印ANR 的Log 日志
  • 步骤6,ActivityManagerService#dumpStackTraces会收集ANR 的Trace 日志
  • 步骤8,AppErrors#handleShowAnrUi会展示给用户ANR 弹窗

Input ANR 收集信息流程

Input ANR 收集信息流程

ANR Trace 日志是解决ANR 问题的关键。system_process进程通过sendSignal(SIGNAL_QUIT 信号)发送信号给发生ANR 的进程,最后发生ANR 的进程在Signal Catcher线程收集所有线程的信息。

代码:

1
2
3
4
5
/art/runtime/signal_catcher.cc
/art/runtime/runtime.cc
/art/runtime/thread_list.cc
/art/runtime/thread.cc
/art/runtime/thread_state.h(定义了线程状态)

具体生成的ANR Trace 日志可以查看上面Trace 日志信息。里面包含了大量的信息:Classloader Info、Intern Table、JavaVM、Heap、oat、JIT、Deoptimizations、ThreadList 等等。最主要的信息是线程的堆栈信息,ThreadList 收集的线程有两种attached 和not attached,详细代码可以查看Thread::DumpStateThread::DumpStack,分别打印线程状态和堆栈信息。

ANR 原因

Input ANR 流程中分析checkWindowReadyForMoreInputLocked有分析产生ANR 的原因。

  1. 无窗口, 有应用:Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.
  2. 窗口暂停:Waiting because the 【targetType】 window is paused.
  3. 窗口未连接:Waiting because the 【targetType】 window’s input channel is not registered with the input dispatcher. The window may be in the process of being removed.
  4. 窗口连接已死亡:Waiting because the 【targetType】 window’s input connection is 【StatusLabel】.The window may be in the process of being removed.
  5. 窗口连接已满:Waiting because the 【targetType】 window’s input channel is full. Outbound queue length: 【outboundQueue.count】. Wait queue length: 【waitQueue.count】.
  6. 按键事件,输出队列或事件等待队列不为空:Waiting to send key event because the 【targetType】 window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 【outboundQueue.count】. Wait queue length: 【waitQueue.count】.
  7. 非按键事件,事件等待队列不为空且头事件分发超时500ms:Waiting to send non-key event because the 【targetType】 window has not finished processing certain input events that were delivered to it over 【STREAM_AHEAD_EVENT_TIMEOUT】 ago. Wait queue length: 【waitQueue.count】. Wait queue head age: 【等待时长】.

ANR 可能存在的误解

大家可能会误解只要主线程执行时间超过5s 就会发生Input ANR,从上面分析中可以得出实际上不是。实际上发生一个Input ANR 至少需要执行两次InputDispatcher::dispatchOnce,第一次是检查可能出现了ANR 并且计算出发生ANR 的时间,第二次InputDispatcher::onANRLocked才是处理ANR 相关的逻辑。

处理ANR 小秘诀

  1. 可以全局搜索ANR in用来定位ANR 原因
  2. 最后收集ANR 日志是通过sendSignal(SIGNAL_QUIT),可以参考实现ANR 监控

参考

彻底理解安卓应用无响应机制
Input系统—ANR原理分析
Input系统—InputDispatcher线程
Input系统—事件处理全过程
Input系统—启动篇
Android trace文件抓取原理