We continue to talk about tools for evaluating CPU performance on Linux machines. Today in the article: temci, uarch-bench, likwid, perf-tools and llvm-mca.
temci
This is a tool for estimating the execution time of two programs. In fact, it allows you to compare the execution time of two applications. The utility was written by a student from Germany, Johannes Bechberger, who developed it as part of his Bachelor undergraduate work in 2016. Today, the tool is licensed under the GNU General Public License.
Johannes wanted to create a tool that would measure the performance of a computing system in a controlled environment. Therefore, one of the main features of temci is the ability to configure a test environment. For example, you can: change the settings of the CPU frequency control manager, disable hyper-threading and L1 and L2 caches, turn off turbo mode on Intel processors and others. For benchmarking, temci uses the time, perf_stat and getrusage tools.
This is how the utility works in the first case:
# compare the run times of two programs, running them each 20 times > temci short exec "sleep 0.1" "sleep 0.2" --runs 20 Benchmark 20 times [####################################] 100% Report for single runs sleep 0.1 ( 20 single benchmarks) avg_mem_usage mean = 0.000, deviation = 0.0 avg_res_set mean = 0.000, deviation = 0.0 etime mean = 100.00000m, deviation = 0.00000% max_res_set mean = 2.1800k, deviation = 3.86455% stime mean = 0.000, deviation = 0.0 utime mean = 0.000, deviation = 0.0 sleep 0.2 ( 20 single benchmarks) avg_mem_usage mean = 0.000, deviation = 0.0 avg_res_set mean = 0.000, deviation = 0.0 etime mean = 200.00000m, deviation = 0.00000% max_res_set mean = 2.1968k, deviation = 3.82530% stime mean = 0.000, deviation = 0.0 utime mean = 0.000, deviation = 0.0
Based on the results of benchmarking, the system generates a convenient report with diagrams, tables and graphs, what makes temci different between similar solutions.
To the disadvantages of temci belong his “youth”. Because of this, it doesn’t support all hardware and software configurations. For example, it is difficult to run temci under macOS, as well some functions are not available on a system with an ARM processor. In the future, the situation may change, as the author is actively developing the project, and the number of stars on GitHub is gradually increasing – not so long ago temci was even discussed in the comments on Hacker News.
uarch-bench
Utility for evaluating the performance of low-level CPU functions, which was developed by engineer Travis Downs. Recently, he has been running his Performance Matters blog on GitHub Pages, which talks about benchmarking tools and other related things. In general, the uarch-bench is just starting to gain popularity, but it is already quite often mentioned by residents of Hacker News in thematic threads as a go-to tool for benchmarking.
Uarch-bench allows you to evaluate memory performance, speed of parallel data loading and work on YMM-registers cleaning. How the benchmarking results generated by the program look like you can find in the official repository.
Uarch-bench, same like temci, disables the Intel Turbo Boost function (it automatically increases the clock frequency under load) so the test results are consistent.
So far, the project is in the early stages of development, so the uarch-bench doesn’t have a detailed documentation and there may be bugs in its work – for example, difficulties with running on Ryzen. The benchmarks for x86 architectures are also supported. The author promises to add more functionality in the future and invites to join the development.
likwid
This is a toolkit for evaluating the performance of Linux machines with Intel, AMD, and ARMv8 processors. It was created under the patronage of the German Federal Ministry of Education and Research in 2017 and transferred to open source.
Among likwid tools, can be distinguished likwid-powermeter, which displays information from RAPL registers about the power consumed by the system, and likwid-setFrequencies – allows you to control the processor frequency. You can find the full list in the repository.
The tool is used by engineers, which are working in the field of HPC research. For example, a group of specialists from the Regional Computing Center of the University of Erlangen-Nuremberg (RRZE) in Germany works with likwid. They are actively involved in the development of this set of tools.
perf-tools
This tool for analyzing Linux server performance was introduced by Brendan Gregg. He is one of the developers of DTrace – a dynamic tracing framework for debugging applications in the real time.
The perf-tools are based on the kernel subsystems of perf_events and ftrace. Their utilities allow you to analyze the input / output delay (iosnoop), track the arguments for accessing system calls (unccount, funcslower, funcgraph and functrace) and collect statistics of “hits” in the file cache (cachestat). In the latter case, the command looks like this:
# ./cachestat -t Counting cache functions... Output every 1 seconds. TIME HITS MISSES DIRTIES RATIO BUFFERS_MB CACHE_MB 08:28:57 415 0 0 100.0% 1 191 08:28:58 411 0 0 100.0% 1 191 08:28:59 362 97 0 78.9% 0 8 08:29:00 411 0 0 100.0% 0 9
A large community has formed around the tool (almost 6000 stars on GitHub). And there are companies that actively use perf-tools, such as Netflix. But the tool is being finalized and modified (but the updates have been released quite rarely lately). Therefore, in its work may occur errors – the author writes that sometimes perf-tools can be a reason of a kernel panic.
llvm-mca
A utility that predicts how much computing resources a machine code will need on different CPUs. It evaluates the Instructions Per Cycle (IPC) and the hardware load that an application generates.
llvm-mca was presented in 2018 as part of the LLVM project, which is developing a universal system for analysis, transformation and optimization of programs. It is known that the authors of llvm-mca were inspired by the Intel IACA software performance analysis solution and sought to create an alternative to it. And according to users, the output of the tool (their markup and quantity) really resembles IACA. However, llvm-mca only accepts AT & T syntax, so you will probably need to use converters to work with it.