mac 网易云音乐 cpu 占用高问题排查
问题描述
网易云音乐客户端在 macos 10.13.6 上不定期地会占用 500% 的 cpu , 重装了也不行
找了一圈也没搜到类似问题 , 恰好最近学习了 linux cpu 问题排查, 派上用场了
另外一个问题是 mac 的 unix 系统和 linux 不太一样,很多工具都没有, 得边查边找替代品
问题排查
- 打算先看看 cpu 各个指标,结果自带的 top 看不到cpu 详细指标,, 只拿到个 pid 17125
perf
的替代品sample 17125 -f output.prof
filtercalltree output.prof
, 得知有大量上下文切换函数和加锁解锁和文件操作
Sort by top of stack, same collapsed (when >= 5):
mach_msg_trap (in libsystem_kernel.dylib) 22676 # 系统上下文切换的 trap
__psynch_cvwait (in libsystem_kernel.dylib) 6186 # 上下文切换等待
__psynch_mutexwait (in libsystem_kernel.dylib) 4828 # 加锁等待
__workq_kernreturn (in libsystem_kernel.dylib) 4124 # 上下文切换返回用户态
__getattrlist (in libsystem_kernel.dylib) 3670 # stat 加强版 (https://blog.csdn.net/kfy2011/article/details/50427878)
__select (in libsystem_kernel.dylib) 2062 # 多路复用那个?
getattrlistbulk (in libsystem_kernel.dylib) 792 # 类似 stat
lstat$INODE64 (in libsystem_kernel.dylib) 620 # 类似 stat
__psynch_mutexdrop (in libsystem_kernel.dylib) 607 # 解锁
__open (in libsystem_kernel.dylib) 587 # 打开文件
close (in libsystem_kernel.dylib) 251 # 关闭文件
- 打开
Activity Monitor.app
, 双击进程发现系统自带了 sample 功能 - 打开
statistics
视图的时候, 发现各项指标都好高 - 和 QQ 进程对比了一下, mach 系统调用正常, 但是 上下文切换次数和 unix 系统调用超出了两个数量级, 有1亿次和6 亿次
- 寻找
strace
的替代品, 看看系统调用sudo dtruss -p 17125
- 发现不停有大量的
stat
系统调用和大量的加锁解锁操作,路径也显示出来了
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x70000D03BD78, 0x70000D03B9D0) = 0 0
lstat64("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x70000D247EB0, 0x0) = 0 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x70000D0BEE90, 0x70000D0BF290) = 0 0
psynch_mutexdrop(0x10B68C070, 0xB1AB1703, 0xB1AB1400) = 0 0
psynch_mutexwait(0x10B68C070, 0xB1AB1503, 0xB1AB1100) = -1314187517 0
geteuid(0x0, 0x0, 0x0) = 501 0
psynch_mutexdrop(0x10B68C070, 0xB1AB1703, 0xB1AB1500) = 0 0
psynch_mutexwait(0x10B68C070, 0xB1AB1603, 0xB1AB1200) = -1314187517 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x60C0001348EC, 0x7F8F3D926CB8) = 0 0
psynch_mutexdrop(0x10B68C070, 0xB1AB1903, 0xB1AB1600) = 0 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x70000CD29E90, 0x70000CD2A290) = 0 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x70000D247D78, 0x70000D2479D0) = 0 0
psynch_mutexwait(0x10B68C070, 0xB1AB1703, 0xB1AB1300) = -1314187005 0
lstat64("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x70000CBA0EB0, 0x0) = 0 0
geteuid(0x0, 0x0, 0x0) = 501 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x6040003248AC, 0x7F8F3E809AB8) = 0 0
psynch_mutexdrop(0x10B68C070, 0xB1AB1A03, 0xB1AB1700) = 0 0
psynch_mutexwait(0x10B68C070, 0xB1AB1803, 0xB1AB1500) = -1314186493 0
psynch_mutexdrop(0x10B68C070, 0xB1AB1B03, 0xB1AB1800) = 0 0
psynch_mutexwait(0x10B68C070, 0xB1AB1903, 0xB1AB1500) = -1314186493 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x70000D1C4E90, 0x70000D1C5290) = 0 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x70000CBA0D78, 0x70000CBA09D0) = 0 0
psynch_mutexdrop(0x10B68C070, 0xB1AB1D03, 0xB1AB1900) = 0 0
psynch_mutexwait(0x10B68C070, 0xB1AB1A03, 0xB1AB1600) = -1314185981 0
geteuid(0x0, 0x0, 0x0) = 501 0
psynch_mutexdrop(0x10B68C070, 0xB1AB1D03, 0xB1AB1A00) = 0 0
psynch_mutexwait(0x10B68C070, 0xB1AB1B03, 0xB1AB1700) = -1314185981 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x70000CA9AE90, 0x70000CA9B290) = 0 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x60800032264C, 0x7F8F3D0156B8) = 0 0
open("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x1100004, 0x5) = 60 0
psynch_mutexdrop(0x10B68C070, 0xB1AB1E03, 0xB1AB1B00) = 0 0
getattrlistbulk(0x3C, 0x60C00031A8C8, 0x7F8F3D92B000) = 0 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x604000322BEC, 0x7F8F3E8088B8) = 0 0
close(0x3C) = 0 0
psynch_mutexwait(0x10B68C070, 0xB1AB1C03, 0xB1AB1800) = -1314185725 0
psynch_mutexdrop(0x10B68C070, 0xB1AB2203, 0xB1AB1C00) = 0 0
psynch_mutexwait(0x10B68C070, 0xB1AB1D03, 0xB1AB1800) = -1314184701 0
psynch_mutexdrop(0x10B68C070, 0xB1AB2203, 0xB1AB1D00) = 0 0
psynch_mutexwait(0x10B68C070, 0xB1AB1E03, 0xB1AB1A00) = -1314184701 0
open("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x1100004, 0x7) = 60 0
psynch_mutexdrop(0x10B68C070, 0xB1AB2203, 0xB1AB1E00) = 0 0
psynch_mutexwait(0x10B68C070, 0xB1AB1F03, 0xB1AB1B00) = -1314184701 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x70000D03BE90, 0x70000D03C290) = 0 0
psynch_mutexdrop(0x10B68C070, 0xB1AB2303, 0xB1AB1F00) = 0 0
getattrlistbulk(0x3C, 0x604000300C78, 0x7F8F3E81F000) = 0 0
close(0x3C) = 0 0
getattrlist("/Users/lin/Library/Containers/com.netease.163music/Data/Caches/online_play_cache\0", 0x60800032228C, 0x7F8F3CA35AB8)
- 有多个线程在不停加锁->读取同一个目录->解锁 , 并且进入了 mutexwait , 有锁争用的情况
- 和正常情况对比, psynch_muetxtwait 的返回值里有个负数 , 像是一直错误or 超时一直重试, 但是看不到客户端源码, 放弃继续往下查
- 尝试把出现在日志里的online_play_cache目录删除,重启客户端自动重建 , 还是出现类似的问题
- 后来查阅了其他两个问题类似的 github issue issue 1 ,issue 2 , 都说是 mac os 版本兼容的问题 , 尤其是前端的 fsevent , 恰好这客户端也是套了js前端做的
- 找了个足够旧版本的网易云音乐安装上 , 问题消失了...
末
-
为什么系统调用高上下文切换也高呢? 因为系统调用需要从用户态陷入(trap)内核态, 切换运行上下文, 如保存寄存器数据 , 参考 6.S081 xv6 课程和 OSTEP
-
时间过得真快, 老系统都已经有不兼容的情况了, 真没想到啊
-
老系统还是不要随便更新东西了, 万恶的客户端一直弹更新
-
虽然没查出什么有用的信息, 但过程还是挺有收获的