Yandex has introduced the initial tool code Perforator, designed to continuously collect detailed metrics on the operation of applications in large clusters and datacenters. This tool, written in C++, is equipped with the capability to analyze applications in real-time, evaluate resource distribution on Linux servers, and identify resource-intensive applications. The code is shared under the MIT license and can be accessed on GitHub.
Within Yandex, Perforator has been deployed in a cluster comprising more than 10 thousand nodes to pinpoint and address performance issues in various services. By optimizing calculations and eliminating bottlenecks, Yandex has successfully reduced server costs by 20%.
The capabilities of Perforator include:
- Utilizing the EBPF kernel subsystem to gather information on kernel and user space components, with minimal performance impact estimated at around 0.1%.
- Scalable storage for performance profiles, using DBMS for metadata profiles storage, Clickhouse for binary metadata storage, and any S3-compatible storage for RAW profiles and binary data.
- Enabling Call Stack promotion in the system environment without the need for inclusion, during debugging sessions.
- Providing a query language and a web interface to monitor CPU load during applications.
- Visualizing narrow areas using flamegraph-style visualization.
- Profiling projects in multiple languages and runtimes without altering the assembly process or program cross-regulation, with support for C++, GO, Rust, Java, Python, and JavaScript/Node.js.
- Generating SPGO profiles for application optimization based on code profiling results.