Machine Learning in Compiler Optimization

August 25, 2022

Z. Wang and M. O’Boyle, “Machine Learning in Compiler Optimization,” in Proceedings of the IEEE, vol. 106, no. 11, pp. 1879-1901, Nov. 2018, doi: 10.1109/JPROC.2018.2817118.

简介

此文章是一篇综述，研究机器学习在编译器优化中的应用。

要解决的问题

软件对硬件性能的挖掘出现 gap，尤其是在并行计算上，编译器开发者期待新方法来弥补这一 gap。

First, despite the year-on-year increasing potential performance of hardware, software is increasingly unable to realize it leading to a software gap. This gap has yawned right open with the advent of multicores (see also Section VI-B). Compiler writers are looking for new ways to bridge this gap.

编译器开发者很难追赶计算机体系结构发展的速度，机器学习被期待学习自动优化编译器，而不是由专家编写启发式方法来优化代码。

Second, computer architecture evolves so quickly that it is difficult to keep up. Each generation has new quirks and compiler writers are always trying to play catchup.

… Rather than relying on expert compiler writers to develop clever heuristics to optimize the code, we can let the machine learn how to optimize a compiler to make the machine run faster, an approach sometimes referred to as autotuning [10]–[11][12][13].

进一步的动机： - 计算机科学具有向更高程度的自动化发展的趋势，对于编译器来说，就是从编译器翻译自动化（lex、yacc）发展到编译器优化自动化。 - 机器学习是一个经典的提升自动化程度的工具。

Machine learning is part of a tradition in computer science and compilation in increasing automation The 1950s to 1970s were spent trying to automate compiler translation, e.g., lex for lexical analysis [14] and yacc for parsing [15]; the last decade by contrast has focused on trying to automate compiler optimization. As we will see, it is not “magic” or a panacea for compiler writers, rather it is another tool allowing automation of tedious aspects of compilation providing new opportunities for innovation.

问题重要性评估

归结起来两个问题，硬件性能挖掘和编译流程自动化。

硬件性能挖掘：
1. 我们有很多体系结构魔法，例如 Superscalar、VLIW、HT/SMT，直觉上编译器适配需要做很多很困难的工作，实际上有待调研。
2. 也有一些趋势（RISC）从硬件上消去复杂性，将其转移到编译器上，直觉上同样使编译器开发变得困难，实际上有待调研。
编译流程自动化：
1. 编译器开发的难度和对开发效率的需求应该是关键和普遍的，否则不会有 LLVM 和 MLIR 这样的项目被推广。
2. 自动化趋势只是一个信念，不过从经验来看是这样的。

一个疑问：编译器优化绝大多数场景是在回答 where（哪些代码片段需要优化）& how（如何达到优化目标），似乎不需要一个模型来决定 what（使用什么优化方法）。因为后者是根据前两者确定的。

不过确实有一些代价很大或收益不确定的编译优化是需要启发式地确定是否进行的，也许 ML 在这方面会派上用场。不过这方面工作可能很困难但不一定很重要，称不上 ML magically 做到了编译优化自动化。

研究进展

概述

模型的目标是预测优化选项。对于 ML 来说是很自然的设计，没有任何新东西要想，但是完全称不上编译优化自动化，编译期开发者还是要写一切优化 pass 以供调用。