Android ANR input 机制导读

Android 工程师应该都遇到过应用无响应（ANR，Application Not Responding）问题，当应用程序一段时间无法及时响应，则会弹出ANR对话框，让用户选择继续等待，还是强制关闭。

前段时间做预装厂商反馈了好些ANR，借此机会再次复习下ANR 相关的知识。App 稳定性监控ANR 也占了大头，对ANR 流程的深刻理解也有助于做监控。希望这些Input ANR 知识对你能有所帮助。

ANR 信息

发生ANR 以后会有3个主要的信息帮助开发者修复问题：Log 日志可以查看ANR 的原因，Trace 日志可以找到具体耗时Trace 路径，弹窗则提示用户发生了ANR。

Log 日志

ANR in com.bugsnag.android.example (com.bugsnag.android.example/.ExampleActivity)
    PID: 5353
    Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago.  Wait queue length: 7.  Wait queue head age: 5518.6ms.)
    Load: 0.19 / 0.13 / 0.05
    CPU usage from 0ms to 9178ms later (2019-09-28 20:29:00.987 to 2019-09-28 20:29:10.164):
      5.9% 5353/com.bugsnag.android.example: 3.1% user + 2.8% kernel / faults: 840 minor
      3.7% 1783/system_server: 1.1% user + 2.5% kernel / faults: 471 minor
    .....  
    7.5% TOTAL: 2.8% user + 4.7% kernel

Trace 日志

----- pid 1783 at 2019-09-28 18:04:06 -----
Cmd line: system_server
Build fingerprint: 'google/sdk_gphone_x86/generic_x86:8.1.0/OSM1.180201.031/5455776:userdebug/dev-keys'
ABI: 'x86'
Build type: optimized
### Classloader Info
Zygote loaded classes=5300 post zygote classes=3590
### Intern Table
Intern table: 47863 strong; 1329 weak
### JavaVM
JNI: CheckJNI is on; globals=2309 (plus 58 weak)
Libraries: /system/lib/libandroid.so /system/lib/libandroid_servers.so /system/lib/libcompiler_rt.so /system/lib/libjavacrypto.so /system/lib/libjnigraphics.so /system/lib/libmedia_jni.so /system/lib/libsoundpool.so /system/lib/libwebviewchromium_loader.so /system/lib/libwifi-service.so libjavacore.so libopenjdk.so (11)
### Heap
Heap: 1% free, 14MB/15MB; 252408 objects
Dumping cumulative Gc timings
...
### oat
/system/priv-app/Telecom/oat/x86/Telecom.odex: speed
/system/framework/oat/x86/services.odex: speed
/system/framework/oat/x86/ethernet-service.odex: speed
/system/framework/oat/x86/wifi-service.odex: speed
/system/framework/oat/x86/com.android.location.provider.odex: speed
/system/priv-app/SettingsProvider/oat/x86/SettingsProvider.odex: speed
/system/priv-app/FusedLocation/oat/x86/FusedLocation.odex: speed
/system/framework/oat/x86/com.android.location.provider.odex: speed
Running non JIT
### ThreadList：分为了Thread 和Unattach Thread
suspend all histogram:	Sum: 2.261ms 99% C.I. 0.133us-782.400us Avg: 75.366us Max: 811us
DALVIK THREADS (102):
"Signal Catcher" daemon prio=5 tid=2 Runnable
  | group="system" sCount=0 dsCount=0 flags=0 obj=0x12c80000 self=0xe4488000
  | sysTid=1788 nice=0 cgrp=default sched=0/0 handle=0xe4750970
  | state=R schedstat=( 6939261 18404 8 ) utm=0 stm=0 core=2 HZ=100
  | stack=0xe4656000-0xe4658000 stackSize=1006KB
  | held mutexes= "mutator lock"(shared held)
  native: #00 pc 0047eae1  /system/lib/libart.so (art::DumpNativeStack(std::__1::basic_ostream<char, std::__1::char_traits<char>>&, int, BacktraceMap*, char const*, art::ArtMethod*, void*)+209)
  
.....
"android.anim" prio=5 (not attached)
  | sysTid=5358 nice=-10 cgrp=default
  | state=S schedstat=( 860788 0 3 ) utm=0 stm=0 core=2 HZ=100
  kernel: futex_wait_queue_me+0xdf/0x130
  kernel: futex_wait+0x113/0x22e
  kernel: do_futex+0xbe/0x825
  kernel: compat_SyS_futex+0xb7/0x129
  kernel: sysenter_dispatch+0x7/0x1a
----- end 1783 -----

弹窗

Input 机制相关流程

主要代码：

1
2
3

/frameworks/native/services/inputflinger/InputDispatcher.cpp
/frameworks/native/services/inputflinger/InputReader.cpp
/frameworks/native/services/inputflinger/EventHub.cpp

InputDispatcher(InputDispatcherThread)线程负责将输入事件分发到目标窗口，其中用到了几个重要的事件队列：

mInBoundQueue 用于记录InputReader 发送过来的输入事件
outBoundQueue 用于记录即将分发给目标应用窗口的输入事件
waitQueue 用于记录已分发给目标应用，且应用尚未处理完成的输入事件
mCommandQueue 用于记录一些特殊事件(比如这里面要讲解的ANR)

下面是mInBoundQueue、outBoundQueue、waitQueue、mCommandQueue 对应的关系。

这里要讲的ANR 和事件分发流程起始点都是InputDispatcherThread::threadLoop。

Input 事件分发流程

代码：

1	/frameworks/native/libs/input/InputTransport.cpp(InputPublisher 和InputChannel)

这里是一个完整的按键事件流程(简易版本)，主要涉及到的流程：

mInBoundQueue 队列中获取输入事件(dispatchOnceInnerLocked)
按键事件分发(dispatchKeyLocked)
查找事件对应的目标窗口(findFocusedWindowTargetsLocked)
enqueueDispatchEntriesLocked 将事件加入到outBoundQueue 队列中
startDispatchCycleLocked 分发事件，如果分发失败会将事件从outBoundQueue 中移除然后加入到waitQueue 队列
InputChannel 部分后面以后可以专门详细分析
事件分发成功以后会重置ANR 以及mPendingEvent 等(releasePendingEventLocked)

Java 层的流程：

Input ANR 流程

Input ANR 产生的流程如果不仔细研究下，可能会存在一些误解。分析下来可将ANR 分成3个步骤：

步骤1(上图2～8)，checkWindowReadyForMoreInputLocked检查可能出现ANR(具体原因可以查看下方：ANR 原因)，handleTargetsNotReadyLocked计算出发生ANR 的超时时间
步骤2(上图9～16)，handleTargetsNotReadyLocked判断是否发生了ANR(当前事件时间距离上次事件时间超过5s，不同的ROM 可能修改这个时间的，具体时间间隔是下一次dispatchOnceInnerLocked距离上一次resetANRTimeoutsLocked时间间隔超过5s)，最后调用onANRLocked记录ANR 事件
步骤3(上图17～20)，runCommandsLockedInterruptible最后执行ANR 相关事件

具体3个步骤开始入口可以查看下方代码InputDispatcher::dispatchOnce()：

void InputDispatcher::dispatchOnce() {
    ...
    {
        ...
        if (!haveCommandsLocked()) {
            dispatchOnceInnerLocked(&nextWakeupTime); // 第1，2步骤执行
        }
        ...
        if (runCommandsLockedInterruptible()) { // 第3 步骤执行
            nextWakeupTime = LONG_LONG_MIN;
        }
    }
    ...
}

ANR 计算关键代码：

void InputDispatcher::dispatchOnceInnerLocked(nsecs_t* nextWakeupTime) {
    nsecs_t currentTime = now();
    ...
    
    // Ready to start a new event.
    // If we don't already have a pending event, go grab one.
    if (! mPendingEvent) {
        ...
        // Get ready to dispatch the event.
        resetANRTimeoutsLocked();
    }
    ...
    switch (mPendingEvent->type) {
    ...
    case EventEntry::TYPE_KEY: {
        KeyEntry* typedEntry = static_cast<KeyEntry*>(mPendingEvent);
        ...
        done = dispatchKeyLocked(currentTime, typedEntry, &dropReason, nextWakeupTime);
        break;
    }
    ...
    }
    ...
}

currentTime表示当前事件开始的时间
如果当前还有未完成的事件(mPendingEvent不为null)则会通过resetANRTimeoutsLocked重置ANR

int32_t InputDispatcher::handleTargetsNotReadyLocked(nsecs_t currentTime,
        const EventEntry* entry,
        const sp<InputApplicationHandle>& applicationHandle,
        const sp<InputWindowHandle>& windowHandle,
        nsecs_t* nextWakeupTime, const char* reason) {
    if (applicationHandle == NULL && windowHandle == NULL) {
        ...
    } else {
        if (mInputTargetWaitCause != INPUT_TARGET_WAIT_CAUSE_APPLICATION_NOT_READY) {
            ...
            nsecs_t timeout;
            ...
            mInputTargetWaitCause = INPUT_TARGET_WAIT_CAUSE_APPLICATION_NOT_READY;
            mInputTargetWaitStartTime = currentTime;
            mInputTargetWaitTimeoutTime = currentTime + timeout;
            mInputTargetWaitTimeoutExpired = false;
            mInputTargetWaitApplicationHandle.clear();
            if (windowHandle != NULL) {
                mInputTargetWaitApplicationHandle = windowHandle->inputApplicationHandle;
            }
            if (mInputTargetWaitApplicationHandle == NULL && applicationHandle != NULL) {
                mInputTargetWaitApplicationHandle = applicationHandle;
            }
        }
    }
    if (mInputTargetWaitTimeoutExpired) {
        return INPUT_EVENT_INJECTION_TIMED_OUT;
    }
    if (currentTime >= mInputTargetWaitTimeoutTime) {
        onANRLocked(currentTime, applicationHandle, windowHandle,
                entry->eventTime, mInputTargetWaitStartTime, reason);
        ...
        *nextWakeupTime = LONG_LONG_MIN;
        return INPUT_EVENT_INJECTION_PENDING;
    } else {
        ...
        return INPUT_EVENT_INJECTION_PENDING;
    }
}

mInputTargetWaitTimeoutTime是计算出发生ANR 的时间。具体计算的值是步骤1(上图2～8)分发事件时间 + timeout(此处timeout为5s)。由于是第1次执行handleTargetsNotReadyLocked所以currentTime < mInputTargetWaitTimeoutTime会直接退出该方法
步骤2(上图9～16)也会执行到handleTargetsNotReadyLocked，currentTime >= mInputTargetWaitTimeoutTime成立会继续执行onANRLocked记录ANR 信息

Input ANR 处理流程

system_process进程处理ANR，流程中也有一些关键需要注意：

步骤2，InputMonitor#notifyANR会计算发生ANR 的进程
步骤5，AppErrors#appNotResponding会打印ANR 的Log 日志
步骤6，ActivityManagerService#dumpStackTraces会收集ANR 的Trace 日志
步骤8，AppErrors#handleShowAnrUi会展示给用户ANR 弹窗

Input ANR 收集信息流程

ANR Trace 日志是解决ANR 问题的关键。system_process进程通过sendSignal(SIGNAL_QUIT 信号)发送信号给发生ANR 的进程，最后发生ANR 的进程在Signal Catcher线程收集所有线程的信息。

代码：

/art/runtime/signal_catcher.cc
/art/runtime/runtime.cc
/art/runtime/thread_list.cc
/art/runtime/thread.cc
/art/runtime/thread_state.h(定义了线程状态)

具体生成的ANR Trace 日志可以查看上面Trace 日志信息。里面包含了大量的信息：Classloader Info、Intern Table、JavaVM、Heap、oat、JIT、Deoptimizations、ThreadList 等等。最主要的信息是线程的堆栈信息，ThreadList 收集的线程有两种attached 和not attached，详细代码可以查看Thread::DumpState和Thread::DumpStack，分别打印线程状态和堆栈信息。

ANR 原因

Input ANR 流程中分析checkWindowReadyForMoreInputLocked有分析产生ANR 的原因。

无窗口, 有应用：Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.
窗口暂停：Waiting because the 【targetType】 window is paused.
窗口未连接：Waiting because the 【targetType】 window’s input channel is not registered with the input dispatcher. The window may be in the process of being removed.
窗口连接已死亡：Waiting because the 【targetType】 window’s input connection is 【StatusLabel】.The window may be in the process of being removed.
窗口连接已满：Waiting because the 【targetType】 window’s input channel is full. Outbound queue length: 【outboundQueue.count】. Wait queue length: 【waitQueue.count】.
按键事件，输出队列或事件等待队列不为空：Waiting to send key event because the 【targetType】 window has not finished processing all of the input events that were previously delivered to it. Outbound queue length: 【outboundQueue.count】. Wait queue length: 【waitQueue.count】.
非按键事件，事件等待队列不为空且头事件分发超时500ms：Waiting to send non-key event because the 【targetType】 window has not finished processing certain input events that were delivered to it over 【STREAM_AHEAD_EVENT_TIMEOUT】 ago. Wait queue length: 【waitQueue.count】. Wait queue head age: 【等待时长】.

ANR 可能存在的误解

大家可能会误解只要主线程执行时间超过5s 就会发生Input ANR，从上面分析中可以得出实际上不是。实际上发生一个Input ANR 至少需要执行两次InputDispatcher::dispatchOnce，第一次是检查可能出现了ANR 并且计算出发生ANR 的时间，第二次InputDispatcher::onANRLocked才是处理ANR 相关的逻辑。

处理ANR 小秘诀

可以全局搜索ANR in用来定位ANR 原因
最后收集ANR 日志是通过sendSignal(SIGNAL_QUIT)，可以参考实现ANR 监控

参考

彻底理解安卓应用无响应机制
 Input系统—ANR原理分析
 Input系统—InputDispatcher线程
 Input系统—事件处理全过程
 Input系统—启动篇
 Android trace文件抓取原理