Re-constructing high-level information for language-specific binary re-optimization
Abstract
In this paper, we show a binary optimizer can achieve competitive performance relative to a state-of-the-art source code compiler by re-constructing high-level information (HLI) from binaries. Recent advances in compiler technologies have resulted in a large performance gap between binaries compiled with old compilers and those compiled with latest ones. This motivated us to develop a binary optimizer for old binaries using a compiler engine for a latest source code compiler. However, a traditional approach to naively convert machine instructions into an intermediate representation (IR) of the compiler engine, does not allow us to take full advantage of optimization techniques available in the compiler. This is because the HLI, such as information about variables and their data types, is not available in such an IR. To address this issue, we have devised a technique to re-construct the HLI from binaries by using contextual information. This contextual information is a set of knowledge about specific compilation technologies, such as the conventions of data structures, the patterns of instruction sequences, and the semantics of runtime routines. With this technique, our binary optimizer has improved the performance of binaries generated from an older compiler by 40.1% on average in the CPU time for a set of benchmarks, which is close to the one due to a source-code recompilation with the same compiler engine, 55.2% on average.