Kwai OOM research KOOM announced today

recently announced the open source KOOM by Kwai Chung, becoming the first Internet company to open online memory overflow problem. According to Kwai KOOM, after the memory monitoring is completed by the client, the analytical report is uploaded to the cloud. The size of the transmission file is only KB level. Users at runtime have no perception, and have no impact on the traffic. They are suitable for large-scale popularization and application. At present, the scheme has been applied in the fast volume business, and the OOM rate has been reduced by more than 80%, and the effect is remarkable. < / P > < p > oom is a common and difficult problem in Android development, especially the oom problem that occurs online is extremely difficult to locate. Leakcanary, the most well-known solution in the industry, optimizes the Java oom problem by monitoring the leakage of activity / fragment. It has been escorting the majority of apps for many years, and has solved the problem of oom governance from 0 to 1. However, in the face of the increasingly complex business environment and huge user traffic in the industry, leakcanary still has room for optimization: limited by performance, it can not be deployed on a large scale online and only supports offline use; it can only locate the leakage of activity & fragment, unable to locate large objects and frequent allocation; it needs to analyze one by one manually, and can not cluster and quantify the problems In order to thoroughly solve the oom problem, the industry has tried a variety of solutions, which are usually optimized based on leakcanary, but so far, the performance problems in the monitoring process have not been completely solved. The common solution is to locate the problem by sacrificing the experience of a small number of users by sampling.

Kwai OOM Killer follows the research ideas of the industry, carries out self-study and transformation for the problems that leakcanary can’t solve, gives full play to the original advantages of leakcanary, and makes up for its shortcomings. It creates a set of closed-loop monitoring system which can be deployed online, offline, configured flexibly, widely applied, highly automated, buried, monitored, analyzed, reported, distributed, followed up, and alarm one-stop service The system intercepts most of the oom problems in the gray stage, and solves the oom problems thoroughly.

Kwai KOOM core processes include: configuring out decisions, monitoring memory status, collecting memory mirrors, parsing mirror files, generating reports and uploading, aggregated alarm and distribution follow up. Before < / P > < p > < p > it was common practice in the industry to Activity.onDestroy After triggering two times GC continuously, and checking the reference queue, it is determined whether Activity has leaked. But frequent GC will cause users to perceive the CAD. The Kwai has designed a new monitoring module for realizing the sensorless trigger, and triggered the image collection through the memory threshold monitoring without performance loss. When the judgment of whether the object is leaked is delayed to the time of parsing, the threshold monitoring only needs to obtain several memory indicators that are concerned regularly in the sub thread, and the performance loss is ignored. < / P > < p > the traditional scheme of collecting memory image will cause the application to be completely frozen for several seconds, during which the user can not operate at all and seriously damage the user experience. The Kwai uses the COW kernel of the system kernel to suspend the virtual machine before each dump memory mirror. Then the fork sub process executes dump operation. The parent process immediately restores the virtual machine after the fork succeeds. The whole process takes only a few milliseconds to the parent process, and has no effect on the user. < / P > < p > the hprof files obtained by traditional schemes are usually large, which takes up a lot of disk space of users, wastes user traffic by uploading large files, and is not conducive to clustering analysis. Kwai has adopted a new idea: using the idea of edge computing, the memory is mirrored in idle time to carry out the single thread local analysis of the independent process, but it takes more resources of the system runtime. After analysis, it is deleted and does not occupy the disk space; the analysis report is only KB level, and does not waste user traffic. The analysis report generation process is generally divided into three steps: the first step scans the image to build an index to establish the basis for leak detection and analysis; the second step finds out the leaked object, and performs the object disclosure judgment according to the existing framework knowledge and the artificially set strategy; The third step generates the final report file, which adds the object leak path, leakage quantity, class statistics and runtime information to the report file to assist the subsequent analysis and solution of oom problems. < / P > < p > for the requirement of image retrieval, the runtime hook clipping is performed on hprof, and only the data necessary for analyzing oom is retained. Tailoring also has the advantage of data desensitization. It only retains the organization structure of classes and objects in memory that are useful for analyzing problems. It does not upload real business data, and fully protects user privacy.

Kwai KOOM plans to build a complete client memory solution. Developers can solve the OOM problem in their projects by accessing KOOM. The first phase of open source only includes Android Java oom solutions for the time being. In the future, Android thread / file descriptor monitoring, Android native oom monitoring, IOS oom monitoring, etc. will be open-source to ultimately realize the vision of helping developers solve oom in various scenarios. Continue ReadingIqoo5 series debut strength interpretation of “120 super full mark flagship”