1、 ANR的定义
ANR,是“Application Not Responding”的缩写,即“应用程序无响应”。如果你应用程序在UI线程被阻塞太长时间,就会出现ANR,通常出现ANR,系统会弹出一个提示提示框,让用户知道,该程序正在被阻塞,是否继续等待还是关闭。
2、ANR的类型
(1)KeyDispatchTimeout(常见)
input事件在5S内没有处理完成发生了ANR。
logcat日志关键字:Input event dispatching timed out
(2)BroadcastTimeout
前台Broadcast:onReceiver在10S内没有处理完成发生ANR。
后台Broadcast:onReceiver在60s内没有处理完成发生ANR。
logcat日志关键字:Timeout of broadcast BroadcastRecord
(3)ServiceTimeout
前台Service:onCreate,onStart,onBind等生命周期在20s内没有处理完成发生ANR。
后台Service:onCreate,onStart,onBind等生命周期在200s内没有处理完成发生ANR
logcat日志关键字:Timeout executing service
(4)ContentProviderTimeout
ContentProvider 在10S内没有处理完成发生ANR。
logcat日志关键字:timeout publishing content providers
3、ANR出现的原因
(1)主线程频繁进行耗时的IO操作:如数据库读写
(2)多线程操作的死锁,主线程被block;
(3)主线程被Binder 对端block;
(4)System Server中WatchDog出现ANR;
(5)service binder的连接达到上线无法和和System Server通信
(6)系统资源已耗尽(管道、CPU、IO)
4、如何分析ANR
(1)日志分析:ANR发生时都会在log中输出错误信息,从log中可以获得ANR的类型,CPU的使用情况,CPU使用率过高有可能是CPU饥饿导致了ANR。CPU使用率过低说明主线程被block了,如果IOwAIt高是因为主线程进行I/O操作造成的。
(2)traces文件分析:除了log输出外,你会发现各个应用进程和系统进程的函数堆栈信息都输出到了一个/data/anr/traces.txt的文件中,这个文件是分析ANR原因的关键文件.要获取到该文件可使用adb指令进行赋权后拉出查看调用stack。通过log、trace.text、代码结合分析ANR的成因(iowait?Memoryleak?Block?)
(3)traces文件无法分析的:不过还存在一些ANR问题,trace文件是分析不了的,例如我们的系统上,人脸识别活体攻击的时候,native算法耗尽cpu资源导致其他app无法抢占cpu时间片导致anr,假如ANR的app是你开发的,估计查到死也找不到问题所在,类似这类问题也写过简要的分析文章:
接下来我们将一步一步分析ANR,这个过程能让我们进一步明白如何找到问题、分析问题以及解决问题。
当ANR发生的时候,系统会调用如下相关的关键函数代码,来将系统当前的关键信息保存到日志当中:
Javaframework/base/services/core/JAVA/com/Android/server/am/AppErrors.java
class AppErrors {
final void appNotResponding(ProcessRecord app, ActivityRecord activity,
ActivityRecord parent, boolean aboveSystem, final String annotation) {
ArrayList<Integer> firstPids = new ArrayList<Integer>(5);
SparseArray<Boolean> lastPids = new SparseArray<Boolean>(20);
if (mService.mController != null) {
try {
// 0 == continue, -1 = kill process immediately
int res = mService.mController.appEarlyNotResponding(
app.processName, app.pid, annotation);
if (res < 0 && app.pid != MY_PID) {
app.kill("anr", true);
}
} catch (RemoteException e) {
mService.mController = null;
Watchdog.getInstance().setActivityController(null);
}
}
long anrTime = SystemClock.uptimeMillis();
if (ActivityManagerService.MONITOR_CPU_USAGE) {
mService.updateCpuStatsNow();
}
// Unless configured otherwise, swallow ANRs in background processes & kill the process.
boolean showBackground = Settings.Secure.getInt(mContext.getContentResolver(),
Settings.Secure.ANR_SHOW_BACKGROUND, 0) != 0;
boolean isSilentANR;
synchronized (mService) {
// PowerManager.reboot() can block for a long time, so ignore ANRs while shutting down.
if (mService.mShuttingDown) {
Slog.i(TAG, "During shutdown skipping ANR: " + app + " " + annotation);
return;
} else if (app.notResponding) {
Slog.i(TAG, "Skipping duplicate ANR: " + app + " " + annotation);
return;
} else if (app.crashing) {
Slog.i(TAG, "Crashing app skipping ANR: " + app + " " + annotation);
return;
} else if (app.killedByAm) {
Slog.i(TAG, "App already killed by AM skipping ANR: " + app + " " + annotation);
return;
} else if (app.killed) {
Slog.i(TAG, "Skipping died app ANR: " + app + " " + annotation);
return;
}
// In case we come through here for the same app before completing
// this one, mark as anring now so we will bail out.
app.notResponding = true;
// 记录ANR到event log
EventLog.writeEvent(EventLogTags.AM_ANR, app.userId, app.pid,
app.processName, app.info.flags, annotation);
// Dump thread traces as quickly as we can, starting with "interesting" processes.
firstPids.add(app.pid);
// Don't dump other PIDs if it's a background ANR
isSilentANR = !showBackground && !isInterestingForBackgroundTraces(app);
if (!isSilentANR) {
int parentPid = app.pid;
if (parent != null && parent.app != null && parent.app.pid > 0) {
parentPid = parent.app.pid;
}
if (parentPid != app.pid) firstPids.add(parentPid);
if (MY_PID != app.pid && MY_PID != parentPid) firstPids.add(MY_PID);
for (int i = mService.mLruProcesses.size() - 1; i >= 0; i--) {
ProcessRecord r = mService.mLruProcesses.get(i);
if (r != null && r.thread != null) {
int pid = r.pid;
if (pid > 0 && pid != app.pid && pid != parentPid && pid != MY_PID) {
if (r.persistent) {
firstPids.add(pid);
if (DEBUG_ANR) Slog.i(TAG, "Adding persistent proc: " + r);
} else if (r.treatLikeActivity) {
firstPids.add(pid);
if (DEBUG_ANR) Slog.i(TAG, "Adding likely IME: " + r);
} else {
lastPids.put(pid, Boolean.TRUE);
if (DEBUG_ANR) Slog.i(TAG, "Adding ANR proc: " + r);
}
}
}
}
}
}
// 记录ANR 到 main log
StringBuilder info = new StringBuilder();
info.setLength(0);
info.append("ANR in ").append(app.processName); //例如:ANR in com.xxxx.moblie (进程名)
if (activity != null && activity.shortComponentName != null) {
info.append(" (").append(activity.shortComponentName).append(")");//例如:(com.xxxx.moblie/.ui.MainActivity)
}
info.append("n");//换行
info.append("PID: ").append(app.pid).append("n");//记录进程id
if (annotation != null) {
info.append("Reason: ").append(annotation).append("n");
}
if (parent != null && parent != activity) {
info.append("Parent: ").append(parent.shortComponentName).append("n");
}
ProcessCpuTracker processCpuTracker = new ProcessCpuTracker(true);
// don't dump native PIDs for background ANRs unless it is the process of interest
String[] nativeProcs = null;
if (isSilentANR) {
for (int i = 0; i < NATIVE_STACKS_OF_INTEREST.length; i++) {
if (NATIVE_STACKS_OF_INTEREST[i].equals(app.processName)) {
nativeProcs = new String[] { app.processName };
break;
}
}
} else {
nativeProcs = NATIVE_STACKS_OF_INTEREST;
}
int[] pids = nativeProcs == null ? null : Process.getPidsForCommands(nativeProcs);
ArrayList<Integer> nativePids = null;
if (pids != null) {
nativePids = new ArrayList<Integer>(pids.length);
for (int i : pids) {
nativePids.add(i);
}
}
// For background ANRs, don't pass the ProcessCpuTracker to
// avoid spending 1/2 second collecting stats to rank lastPids.
File tracesFile = ActivityManagerService.dumpStackTraces(
true, firstPids,
(isSilentANR) ? null : processCpuTracker,
(isSilentANR) ? null : lastPids,
nativePids);
String cpuInfo = null;
if (ActivityManagerService.MONITOR_CPU_USAGE) {
mService.updateCpuStatsNow();
synchronized (mService.mProcessCpuTracker) {
cpuInfo = mService.mProcessCpuTracker.printCurrentState(anrTime);
}
info.append(processCpuTracker.printCurrentLoad());
info.append(cpuInfo);
}
info.append(processCpuTracker.printCurrentState(anrTime));
//将ANR相关的错误日志信息打印出来
Slog.e(TAG, info.toString());
...省略...
}
通过上面的代码我们知道,当ANR发生的时候,会将ANR记录到event log和main log中。
查看mobilelog文件夹下的events_log,从日志中搜索关键字:am_anr,找到出现ANR的时间点、进程PID、ANR类型。
如日志:
07-20 15:36:36.472 1000 1520 1597 I am_anr : [0,1480,com.xxxx.moblie,952680005,Input dispatching timed out (AppWindowToken{da8f666 token=Token{5501f51 ActivityRecord{15c5c78 u0 com.xxxx.moblie/.ui.MainActivity t3862}}}, Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)]
从上面的log我们可以看出: 应用com.xxxx.moblie 在07-20 15:36:36.472时间,发生了一次KeyDispatchTimeout类型的ANR,它的进程号是1480.
把关键的信息整理一下:
我们已经知道了发生KeyDispatchTimeout的ANR是因为 input事件在5秒内没有处理完成。那么在这个时间07-20 15:36:36.472 的前5秒,也就是(15:36:30 ~15:36:31)时间段左右程序到底做了什么事情?这个简单,因为我们已经知道pid了,再搜索一下pid = 1480的日志.这些日志表示该进程所运行的轨迹,关键的日志如下:
07-20 15:36:29.749 10102 1480 1737 D moblie-Application: [Thread:17329] receive an intent from server, action=com.ttt.push.RECEIVE_MESSAGE
07-20 15:36:30.136 10102 1480 1737 D moblie-Application: receiving an empty message, drop
07-20 15:36:35.791 10102 1480 1766 I Adreno : QUALCOMM build : 9c9b012, I92eb381bc9
07-20 15:36:35.791 10102 1480 1766 I Adreno : Build Date : 12/31/17
07-20 15:36:35.791 10102 1480 1766 I Adreno : OpenGL ES Shader Compiler Version: EV031.22.00.01
07-20 15:36:35.791 10102 1480 1766 I Adreno : Local Branch :
07-20 15:36:35.791 10102 1480 1766 I Adreno : Remote Branch : refs/tags/AU_linux_ANDROID_LA.UM.6.4.R1.08.00.00.309.049
07-20 15:36:35.791 10102 1480 1766 I Adreno : Remote Branch : NONE
07-20 15:36:35.791 10102 1480 1766 I Adreno : Reconstruct Branch : NOTHING
07-20 15:36:35.826 10102 1480 1766 I vndksupport: sphal namespace is not configured for this process. Loading /vendor/lib64/hw/gralloc.msm8998.so from the current namespace instead.
07-20 15:36:36.682 10102 1480 1480 W ViewRootImpl[MainActivity]: Cancelling event due to no window focus: KeyEvent { action=ACTION_UP, keyCode=KEYCODE_PERIOD, scanCode=0, metaState=0, flags=0x28, repeatCount=0, eventTime=16099429, downTime=16099429, deviceId=-1, source=0x101 }
从上面我们可以知道,在时间 07-20 15:36:29.749 程序收到了一个action消息。
07-20 15:36:29.749 10102 1480 1737 D moblie-Application: [Thread:17329] receive an intent from server, action=com.ttt.push.RECEIVE_MESSAGE。
原来是应用com.xxxx.moblie 收到了一个推送消息(com.ttt.push.RECEIVE_MESSAGE)导致了阻塞,我们再串联一下目前所获取到的信息:在时间07-20 15:36:29.749 应用com.xxxx.moblie 收到了一下推送信息action=com.ttt.push.RECEIVE_MESSAGE发生阻塞,5秒后发生了KeyDispatchTimeout的ANR。
在分析ANR的时候,我们首先要确认是不是当时CPU很紧张、各路APP都在抢占资源,CPU无法及时响应最终导致了ANR?为了排查这种情况,我们就需要获取ANR发生时候的CPU状态参数。
1、接下来我们来看一个ANR模拟日志案例
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: ANR in com.xxxx.moblie (com.xxxx.moblie/.ui.MainActivity) (关键字ANR in + 进程名 + Activity名称)
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: PID: 1480 (进程pid)
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: Reason: Input dispatching timed out (AppWindowToken{da8f666 token=Token{5501f51 ActivityRecord{15c5c78 u0 com.xxxx.moblie/.ui.MainActivity t3862}}}, Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.)(ANR的原因,输入分发超时)
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: Load: 0.0 / 0.0 / 0.0 (Load表明是1分钟,5分钟,15分钟CPU的负载)
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: CPU usage from 20ms to 20286ms later (2018-07-20 15:36:36.170 to 2018-07-20 15:36:56.436):
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 42% 6774/pressure: 41% user + 1.4% kernel / faults: 168 minor
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 34% 142/kswapd0: 0% user + 34% kernel
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 31% 1520/system_server: 13% user + 18% kernel / faults: 58724 minor 1585 major
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 13% 29901/com.ss.android.article.news: 7.7% user + 6% kernel / faults: 56007 minor 2446 major
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 13% 32638/com.android.quicksearchbox: 9.4% user + 3.8% kernel / faults: 48999 minor 1540 major
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 11% (CPU的使用率)1480/com.xxxx.moblie: 5.2%(用户态的使用率) user + (内核态的使用率) 6.3% kernel / faults: 76401 minor 2422 major
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 8.2% 21000/kworker/u16:12: 0% user + 8.2% kernel
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 0.8% 724/mtd: 0% user + 0.8% kernel / faults: 1561 minor 9 major
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 8% 29704/kworker/u16:8: 0% user + 8% kernel
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 7.9% 24391/kworker/u16:18: 0% user + 7.9% kernel
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 7.1% 30656/kworker/u16:14: 0% user + 7.1% kernel
07-20 15:36:58.711 1000 1520 1597 E ActivityManager: 7.1% 9998/kworker/u16:4: 0% user + 7.1% kernel
通过上面所提供的案例我们可以分析出以下几点:
2、下面所提供的是一个ANR的真实日志案例
10-09 19:35:22.124 940 968 E ActivityManager: ANR in com.example.anrtest (com.example.anrtest/.MainActivity) // 记录ANR+进程名+Activity名称
10-09 19:35:22.124 940 968 E ActivityManager: PID: 8390 //记录进程ID
10-09 19:35:22.124 940 968 E ActivityManager: Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 28. Wait queue head age: 5517.5ms.)(ANR的原因,输入分发超时)
10-09 19:35:22.124 940 968 E ActivityManager: Load: 2.52 / 2.57 / 2.73
10-09 19:35:22.124 940 968 E ActivityManager: CPU usage from 99984ms to 0ms ago (2022-10-09 19:33:39.209 to 2022-10-09 19:35:19.194):
10-09 19:35:22.124 940 968 E ActivityManager: 25% 445/surfaceflinger: 14% user + 11% kernel / faults: 11962 minor
10-09 19:35:22.124 940 968 E ActivityManager: 11% 394/android.hardware.graphics.composer@2.2-service: 2.9% user + 8.2% kernel / faults: 1 minor
10-09 19:35:22.124 940 968 E ActivityManager: 10% 2101/com.leapmotor.appcenter: 7.8% user + 3.1% kernel / faults: 926 minor
10-09 19:35:22.124 940 968 E ActivityManager: 3.2% 1961/com.iflytek.cutefly.speechclient.hmi: 2.5% user + 0.7% kernel / faults: 2111 minor
10-09 19:35:22.124 940 968 E ActivityManager: 1.8% 386/android.hardware.audio@2.0-service: 0.2% user + 1.6% kernel10-09 19:35:22.124 940 968 E ActivityManager: 0.9% 940/system_server: 0.6% user + 0.3% kernel / faults: 5334 minor10-09 19:35:22.124 940 968 E ActivityManager: 0.7% 440/audIOServer: 0.3% user + 0.3% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.3% 277/apr_vm_cb_threa: 0% user + 0.3% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.3% 544/leap_uhab: 0% user + 0.2% kernel / faults: 20 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0.3% 5219/com.leapmotor.leapmotorsoscall: 0.1% user + 0.1% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.2% 2367/com.leapmotor.multimedia: 0% user + 0.2% kernel / faults: 9 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0.2% 8215/kworker/2:3: 0% user + 0.2% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.2% 3992/adbd: 0% user + 0.2% kernel / faults: 46652 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0.2% 7786/com.ebanma.tinyapp: 0.2% user + 0% kernel / faults: 557 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0.2% 7895/com.ebanma.tinyapp:DataCenterService: 0.2% user + 0% kernel / faults: 463 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 543/leap_systemsdk_service: 0% user + 0.1% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 1392/com.leapmotor.cameraaround: 0% user + 0.1% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 410/leap_vsomeip_route: 0.1% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 3554/com.leapmotor.driverecord:emergency: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 411/leap_camera_around: 0% user + 0.1% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 2915/com.leapmotor.systemupdate: 0% user + 0% kernel / faults: 5 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 9/rcu_preempt: 0% user + 0.1% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 412/leap_camera_face: 0% user + 0.1% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 413/leap_camera_front: 0% user + 0.1% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 2021/com.leapmotor.facevideo: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 2554/com.leapmotor.phone: 0% user + 0.1% kernel / faults: 10 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 1916/com.leapmotor.log: 0% user + 0% kernel / faults: 2 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 2188/com.leapmotor.camera:front_encode: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 2211/com.leapmotor.camera:around_encode: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0.1% 3541/com.leapmotor.driverecord:trip: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 285/logd: 0% user + 0% kernel / faults: 3 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0% 325/ais_v4l2_proxy: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 541/leap_shutdown: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 2243/com.leapmotor.camera:front_push_encode: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 1199/com.android.systemui: 0% user + 0% kernel / faults: 576 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0% 3492/com.leapmotor.driverecord: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 251/vlog: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 560/installd: 0% user + 0% kernel / faults: 8 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0% 4106/kworker/u8:0: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 6191/installer: 0% user + 0% kernel / faults: 295 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0% 8/ksoftirqd/0: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 18/ksoftirqd/1: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 1131/com.android.car: 0% user + 0% kernel / faults: 553 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0% 2134/com.leapmotor.bt:bt_service: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 7818/kworker/u9:0: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 24/ksoftirqd/2: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 512/leap_logcat: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 10/rcu_sched: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 30/ksoftirqd/3: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 70/system: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 240/kworker/u8:10: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 281/jbd2/vdb-8: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 286/servicemanager: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 287/hwservicemanager: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 330/zygote64: 0% user + 0% kernel / faults: 153 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0% 409/vendor.qti.hardware.perf@1.0-service: 0% user + 0% kernel / faults: 37 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0% 542/leap_vsomeip_qnx_heart: 0% user + 0% kernel / faults: 3 minor10-09 19:35:22.124 940 968 E ActivityManager: 0% 654/msm_irqbalance: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 677/ipacm: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 1366/.dataservices: 0% user + 0% kernel / faults: 3 minor
10-09 19:35:22.124 940 968 E ActivityManager: 0% 2665/dmesg: 0% user + 0% kernel
10-09 19:35:22.124 940 968 E ActivityManager: 0% 2893/android.process.acore: 0% user + 0% kernel / faults: 35 minor
10-09 19:35:22.124 2101 2120 I motor.appcente: Wrote stack traces to '[tombstoned]'
10-09 19:35:22.125 940 969 W ActivityManager: Force finishing activity com.example.anrtest/.MainActivity
10-09 19:35:22.127 940 969 I ActivityManager: saveTopActivity mTmpTop=com.example.anrtest/com.example.anrtest.MainActivity
通过上面所提供的案例我们可以分析出以下几点:
通过上面几点我们虽然排除了CPU过度负载的可能,但我们并不能准确定位出ANR的确切位置,要想准确定位出ANR发生的确切位置,就要借助系统为了解决ANR问题而提供的终极大杀器——traces.txt文件了。
当APP不响应、响应慢了、或者WatchDog的监视没有得到回应时,系统就会dump出一个traces.txt文件,存放在文件目录:/data/anr/文件夹中,通过traces文件,我们可以拿到线程名、堆栈信息、线程当前状态、binder call等信息。
我们可以通过adb命令获取到该文件夹下面的所有traces文件:adb pull /data/anr
trace:
Cmd line:com.xxxx.moblie
"main" prio=5 tid=1 Runnable
| group="main" sCount=0 dsCount=0 obj=0x73bcc7d0 self=0x7f20814c00
| sysTid=20176 nice=-10 cgrp=default sched=0/0 handle=0x7f251349b0
| state=R schedstat=( 0 0 0 ) utm=12 stm=3 core=5 HZ=100
| stack=0x7fdb75e000-0x7fdb760000 stackSize=8MB
| held mutexes= "mutator lock"(shared held)
// java 堆栈调用信息,可以查看调用的关系,定位到具体位置
at ttt.push.InterceptorProxy.addMiuiApplication(InterceptorProxy.java:77)
at ttt.push.InterceptorProxy.create(InterceptorProxy.java:59)
at android.app.Activity.onCreate(Activity.java:1041)
at miui.app.Activity.onCreate(SourceFile:47)
at com.xxxx.moblie.ui.b.onCreate(SourceFile:172)
at com.xxxx.moblie.ui.MainActivity.onCreate(SourceFile:68)
at android.app.Activity.performCreate(Activity.java:7050)
at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1214)
at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:2807)
at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:2929)
at android.app.ActivityThread.-wrap11(ActivityThread.java:-1)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1618)
at android.os.Handler.dispatchMessage(Handler.java:105)
at android.os.Looper.loop(Looper.java:171)
at android.app.ActivityThread.main(ActivityThread.java:6699)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.Zygote$MethodAndArgsCaller.run(Zygote.java:246)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:783)
这里详细解析一下traces.txt里面的一些字段,看看它到底能给我们提供什么信息.
Java的堆栈信息是我们最关心的,它能够定位到具体位置。从上面的traces,我们可以判断ttt.push.InterceptorProxy.addMiuiApplicationInterceptorProxy.java:77 导致了com.xxxx.moblie发生了ANR。这时候可以对着源码查看,找到出问题,并且解决它。
接下来再看一个真实的导致ANR的trace案例:
//开头显示进程号、ANR发生的时间点和进程名称
----- pid 4972 at 2022-10-09 16:45:41 -----
Cmd line: com.example.anrtest
...省略...
DALVIK THREADS (14)://以下是各个线程的函数堆栈信息
//依次是:线程名(后面标识有daemon说明是个守护线程)、线程优先级、线程号、线程当前状态(TIMED_WAIT或SUSPENDED在anr时很常见)
"Signal Catcher" daemon prio=5 tid=3 Runnable
//依次是:线程组名称、suspendCount个数、debugSuspendCount个数、标记,线程的Java对象地址、线程的Native对象地址
| group="system" sCount=0 dsCount=0 flags=0 obj=0x15980100 self=0x7322e16400
//sysTid是线程号,主线程的线程号和进程号相同
| sysTid=4978 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0
| state=R schedstat=( 5583230 6778645 10 ) utm=0 stm=0 core=3 HZ=100
| stack=0x731c235000-0x731c237000 stackSize=1009KB
| held mutexes= "mutator lock"(shared held)
//“Signal Catcher”负责接收和处理kernel发送的各种信号,例如SIGNAL_QUIT、SIGNAL_USR1等就是被该线程
//接收到,这个文件的内容就是由该线程负责输出的,可以看到它的状态是RUNNABLE,不过此线程也不需要关心
...省略...
//主线程,当前处于休眠状态
"main" prio=5 tid=1 Sleeping
| group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00
| sysTid=4972 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548
| state=S schedstat=( 618097798 184039398 700 ) utm=56 stm=5 core=3 HZ=100
| stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB
| held mutexes=
at java.lang.Thread.sleep(Native method)
- sleeping on <0x0d9c45cc> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:373)
- locked <0x0d9c45cc> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:314)
at android.os.SystemClock.sleep(SystemClock.java:127)
at com.example.anrtest.MainActivity$1.onClick(MainActivity.java:57)
at android.view.View.performClick(View.java:6597)
at android.view.View.performClickInternal(View.java:6574)
at android.view.View.access$3100(View.java:778)
at android.view.View$PerformClick.run(View.java:25885)
at android.os.Handler.handleCallback(Handler.java:873)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:193)
at android.app.ActivityThread.main(ActivityThread.java:6718)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
...省略...
"Jit thread pool worker thread 0" daemon prio=5 tid=2 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x15980000 self=0x731c029000
| sysTid=4977 nice=9 cgrp=default sched=0/0 handle=0x731c4314f0
| state=S schedstat=( 122669387 79479997 188 ) utm=8 stm=3 core=3 HZ=100
| stack=0x731c333000-0x731c335000 stackSize=1021KB
| held mutexes=
...省略...
//Binder线程是进程的线程池中用来处理binder请求的线程
"Binder:4972_1" prio=5 tid=9 Native
| group="main" sCount=1 dsCount=0 flags=1 obj=0x159829a0 self=0x731a60d000
| sysTid=4984 nice=0 cgrp=default sched=0/0 handle=0x73099b34f0
| state=S schedstat=( 12586983 41079166 22 ) utm=1 stm=0 core=3 HZ=100
| stack=0x73098b8000-0x73098ba000 stackSize=1009KB
| held mutexes=
...省略...
总结一下上面我们分析ANR的主体流程:
1、首先我们搜索am_anr,找到出现ANR的时间点、进程PID、ANR类型、然后再找搜索PID,找前5秒左右的日志。
2、过滤ANR IN 查看CPU信息
3、接着查看traces.txt,找到java的堆栈信息定位代码位置,最后查看源码,分析与解决问题。
到这里,通过上面三个步骤我们基本就能定位出来大部分ANR的来龙去脉了。
我在接下来的第六步和第七步还继续提供了多个ANR案例供大家参考分析,有兴趣的可以看着。
1、主线程被其他线程lock,导致死锁
waiting on <0x1cd570> (a android.os.MessageQueue)
DALVIK THREADS:
"main" prio=5 tid=3 TIMED_WAIT
| group="main" sCount=1 dsCount=0 s=0 obj=0x400143a8
| sysTid=691 nice=0 sched=0/0 handle=-1091117924
at java.lang.Object.wait(Native Method)
- waiting on <0x1cd570> (a android.os.MessageQueue)
at java.lang.Object.wait(Object.java:195)
at android.os.MessageQueue.next(MessageQueue.java:144)
at android.os.Looper.loop(Looper.java:110)
at android.app.ActivityThread.main(ActivityThread.java:3742)
at java.lang.reflect.Method.invokeNative(Native Method)
at java.lang.reflect.Method.invoke(Method.java:515)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:739)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:497)
at dalvik.system.NativeStart.main(Native Method)
"Binder Thread #3" prio=5 tid=15 NATIVE
| group="main" sCount=1 dsCount=0 s=0 obj=0x434e7758
| sysTid=734 nice=0 sched=0/0 handle=1733632
at dalvik.system.NativeStart.run(Native Method)
"Binder Thread #2" prio=5 tid=13 NATIVE
| group="main" sCount=1 dsCount=0 s=0 obj=0x1cd570
| sysTid=696 nice=0 sched=0/0 handle=1369840
at dalvik.system.NativeStart.run(Native Method)
"Binder Thread #1" prio=5 tid=11 NATIVE
| group="main" sCount=1 dsCount=0 s=0 obj=0x433aca10
| sysTid=695 nice=0 sched=0/0 handle=1367448
at dalvik.system.NativeStart.run(Native Method)
----- end 691 -----
2、主线程做耗时的操作:比如数据库读写。
"main" prio=5 tid=1 Native
held mutexes=
kernel: (couldn't read /proc/self/task/11003/stack)
native: #00 pc 000492a4 /system/lib/libc.so (nanosleep+12)
native: #01 pc 0002dc21 /system/lib/libc.so (usleep+52)
native: #02 pc 00009cab /system/lib/libsqlite.so (???)
native: #03 pc 00011119 /system/lib/libsqlite.so (???)
native: #04 pc 00016455 /system/lib/libsqlite.so (???)
native: #16 pc 0000fa29 /system/lib/libsqlite.so (???)
native: #17 pc 0000fad7 /system/lib/libsqlite.so (sqlite3_prepare16_v2+14)
native: #18 pc 0007f671 /system/lib/libandroid_runtime.so (???)
native: #19 pc 002b4721 /system/framework/arm/boot-framework.oat (Java_android_database_sqlite_SQLiteConnection_nativePrepareStatement__JLjava_lang_String_2+116)
at android.database.sqlite.SQLiteConnection.setWalModeFromConfiguration(SQLiteConnection.java:294)
at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:215)
at android.database.sqlite.SQLiteConnection.open(SQLiteConnection.java:193)
at android.database.sqlite.SQLiteConnectionPool.openConnectionLocked(SQLiteConnectionPool.java:463)
at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:185)
at android.database.sqlite.SQLiteConnectionPool.open(SQLiteConnectionPool.java:177)
at android.database.sqlite.SQLiteDatabase.openInner(SQLiteDatabase.java:808)
locked <0x0db193bf> (a java.lang.Object)
at android.database.sqlite.SQLiteDatabase.open(SQLiteDatabase.java:793)
at android.database.sqlite.SQLiteDatabase.openDatabase(SQLiteDatabase.java:696)
at android.app.ContextImpl.openOrCreateDatabase(ContextImpl.java:690)
at android.content.ContextWrapper.openOrCreateDatabase(ContextWrapper.java:299)
at android.database.sqlite.SQLiteOpenHelper.getDatabaseLocked(SQLiteOpenHelper.java:223)
at android.database.sqlite.SQLiteOpenHelper.getWritableDatabase(SQLiteOpenHelper.java:163)
locked <0x045a4a8c> (a com.xxxx.video.common.data.DataBaseHelper)
at com.xxxx.video.common.data.DataBaseoRM.<init>(DataBaseORM.java:46)
at com.xxxx.video.common.data.DataBaseORM.getInstance(DataBaseORM.java:53)
locked <0x017095d5> (a java.lang.Class<com.xxxx.video.common.data.DataBaseORM>)
3、binder数据量过大
07-21 04:43:21.573 1000 1488 12756 E Binder : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 388568 (data: 1, 32, 7274595)
07-21 04:43:21.573 1000 1488 12756 E Binder : android.util.Log$TerribleFailure: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 388568 (data: 1, 32, 7274595)
07-21 04:43:21.607 1000 1488 2951 E Binder : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 211848 (data: 1, 23, 7274595)
07-21 04:43:21.607 1000 1488 2951 E Binder : android.util.Log$TerribleFailure: Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 211848 (data: 1, 23, 7274595)
07-21 04:43:21.662 1000 1488 6258 E Binder : Unreasonably large binder reply buffer: on android.content.pm.BaseParceledListSlice$1@770c74f calling 1 size 259300 (data: 1, 33, 7274595)
4、binder 通信失败
07-21 06:04:35.580 <6>[32837.690321] binder: 1698:2362 transaction failed 29189/-3, size 100-0 line 3042
07-21 06:04:35.594 <6>[32837.704042] binder: 1765:4071 transaction failed 29189/-3, size 76-0 line 3042
07-21 06:04:35.899 <6>[32838.009132] binder: 1765:4067 transaction failed 29189/-3, size 224-8 line 3042
07-21 06:04:36.018 <6>[32838.128903] binder: 1765:2397 transaction failed 29189/-22, size 348-0 line 2916
以下的几个案例,都是我通过代码故意触发ANR所产生的trace文件。
1、输入的事件在5秒内没有被响应,主线程被阻塞。
输入事件5s没有响应,如onClick事件。这是anr问题的主要类型,一般开发者不会犯这样的错,凡是耗时的操作都另开线程处理,如果疏忽了看看自己的代码就知道怎么处理。
(1)模拟触发ANR:
Java
findViewById(R.id.btn_test).setOnClickListener(new View.OnClickListener() {
@Override
public void onClick(View v) {
SystemClock.sleep(40000);//故意响应超时导致ANR
}
});
(2)对应的Trace文件关键日志:
DALVIK THREADS (14): //虚拟机线程被阻塞
"Signal Catcher" daemon prio=5 tid=3 Runnable
| group="system" sCount=0 dsCount=0 flags=0 obj=0x15980100 self=0x7322e16400
| sysTid=4978 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0
| state=R schedstat=( 5583230 6778645 10 ) utm=0 stm=0 core=3 HZ=100
| stack=0x731c235000-0x731c237000 stackSize=1009KB
| held mutexes= "mutator lock"(shared held)
"main" prio=5 tid=1 Sleeping //主线程被阻塞
| group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00
| sysTid=4972 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548
| state=S schedstat=( 618097798 184039398 700 ) utm=56 stm=5 core=3 HZ=100
| stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB
| held mutexes=
at java.lang.Thread.sleep(Native method)
- sleeping on <0x0d9c45cc> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:373)
- locked <0x0d9c45cc> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:314)
at android.os.SystemClock.sleep(SystemClock.java:127) //代码被阻塞的关键位置
at com.example.anrtest.MainActivity$1.onClick(MainActivity.java:57) //按钮的点击事件
at android.view.View.performClick(View.java:6597)
at android.view.View.performClickInternal(View.java:6574)
at android.view.View.access$3100(View.java:778)
at android.view.View$PerformClick.run(View.java:25885)
at android.os.Handler.handleCallback(Handler.java:873)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:193)
at android.app.ActivityThread.main(ActivityThread.java:6718)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
2、广播接收者的onReceive方法在10秒内没有执行完毕。
BroadcastReceiver是在程序主线程运行,而且默认情况下BroadcastReceiver的运行时间为10s,所以不能有耗时操作,如果耗时超过10s就会导致anr,从traces文件log就可以看出onReceive不能进行耗时任务。
(1)模拟触发ANR:
Java
IntentFilter intentFilter = new IntentFilter();
intentFilter.addAction("com.anr.test");
registerReceiver(new BroadcastReceiver() {
@Override
public void onReceive(Context context, Intent intent) {
SystemClock.sleep(40000);//故意响应超时导致ANR
}
}, intentFilter);
(2)对应的Trace文件关键日志:
DALVIK THREADS (14): //虚拟机线程被阻塞
"Signal Catcher" daemon prio=5 tid=3 Runnable
| group="system" sCount=0 dsCount=0 flags=0 obj=0x13100088 self=0x7322e16400
| sysTid=6639 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0
| state=R schedstat=( 6547603 5577708 11 ) utm=0 stm=0 core=0 HZ=100
| stack=0x731c235000-0x731c237000 stackSize=1009KB
| held mutexes= "mutator lock"(shared held)
"main" prio=5 tid=1 Sleeping //主线程被阻塞
| group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00
| sysTid=6633 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548
| state=S schedstat=( 605006551 138044643 533 ) utm=56 stm=3 core=2 HZ=100
| stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB
| held mutexes=
at java.lang.Thread.sleep(Native method)
- sleeping on <0x08c05759> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:373)
- locked <0x08c05759> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:314)
at android.os.SystemClock.sleep(SystemClock.java:127) //代码被阻塞的关键位置
at com.example.anrtest.MainActivity$3.onReceive(MainActivity.java:80) //广播接收者的onReceive方法
at android.app.LoadedApk$ReceiverDispatcher$Args.lambda$getRunnable$0(LoadedApk.java:1391)
at android.app.-$$Lambda$LoadedApk$ReceiverDispatcher$Args$_BumDX2UKsnxLVrE6UJsJZkotuA.run(lambda:-1)
at android.os.Handler.handleCallback(Handler.java:873)
at android.os.Handler.dispatchMessage(Handler.java:99)
at android.os.Looper.loop(Looper.java:193)
at android.app.ActivityThread.main(ActivityThread.java:6718)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)
BroadcastReceiver是Android的消息组件,用来组件之间、应用之间的消息传递,生命周期太短也不能开子线程处理耗时任务,耗时任务一般转交给IntentService或者JobIntentService去做。
3、服务Service没有及时响应
Service是计算型组件,虽然在后台运行,但是本质上它也跑在主线程,如果你的服务要做任何CPU密集型(如MP3播放)或阻塞(如网络)操作,都要放在子线程中,否则耗时超过20s就会导致anr。下面我在onStartCommand方法中休眠40秒。
(1)模拟触发ANR:
Java
@Override
public int onStartCommand(Intent intent, int flags, int startId) {
SystemClock.sleep(40000);//故意响应超时导致ANR
return super.onStartCommand(intent, flags, startId);
}
(2)对应的Trace文件关键日志:
DALVIK THREADS (13): //虚拟机线程被阻塞
"Signal Catcher" daemon prio=5 tid=3 Runnable
| group="system" sCount=0 dsCount=0 flags=0 obj=0x13940108 self=0x7322e16400
| sysTid=7658 nice=0 cgrp=default sched=0/0 handle=0x731c3304f0
| state=R schedstat=( 5995782 10695989 10 ) utm=0 stm=0 core=3 HZ=100
| stack=0x731c235000-0x731c237000 stackSize=1009KB
| held mutexes= "mutator lock"(shared held)
"main" prio=5 tid=1 Sleeping //主线程被阻塞
| group="main" sCount=1 dsCount=0 flags=1 obj=0x73e64b08 self=0x7322e14c00
| sysTid=7652 nice=-10 cgrp=default sched=0/0 handle=0x73a88cc548
| state=S schedstat=( 595004855 101962599 596 ) utm=40 stm=18 core=2 HZ=100
| stack=0x7fd05f2000-0x7fd05f4000 stackSize=8MB
| held mutexes=
at java.lang.Thread.sleep(Native method)
- sleeping on <0x0e096f2a> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:373)
- locked <0x0e096f2a> (a java.lang.Object)
at java.lang.Thread.sleep(Thread.java:314)
at android.os.SystemClock.sleep(SystemClock.java:127) //代码被阻塞的关键位置
at com.example.anrtest.ANRService.onStartCommand(ANRService.java:17) //服务的onStartCommand方法
at android.app.ActivityThread.handleServiceArgs(ActivityThread.java:3705)
at android.app.ActivityThread.access$1600(ActivityThread.java:200)
at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1688)
at android.os.Handler.dispatchMessage(Handler.java:106)
at android.os.Looper.loop(Looper.java:193)
at android.app.ActivityThread.main(ActivityThread.java:6718)
at java.lang.reflect.Method.invoke(Native method)
at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:858)