对此，Jeff Dean可以说是非常的的乐观，他认为“凡是我们使用启发式（heuristic ）技术来做决定的领域，都是可能应用机器学习的好地方”。
A heuristic technique (/hjʊəˈrɪstɪk/; Ancient Greek: εὑρίσκω, "find" or "discover"), often called simply a heuristic, is any approach to problem solving, learning, or discovery that employs a practical method not guaranteed to be optimal or perfect, but sufficient for the immediate goals. Where finding an optimal solution is impossible or impractical, heuristic methods can be used to speed up the process of finding a satisfactory solution. Heuristics can be mental shortcuts that ease the cognitive load of making a decision. - Wikipedia
其中，第一点是强调找到一个能用数字表示的指标，对增强学习来说，这就意味着一个清晰准确的Reward；而第二点，对于增强学习来说意味着能不能获得准确的环境（Environment），对于监督学习而言则意味着能不能方便的获得训练和测试数据。如果大家还不清楚这两点对于RL能否实现的重要性，可以参考这篇文章“推特爆款：谷歌大脑工程师的深度强化学习劝退文”。这里的好消息是，对于计算系统的优化，这两个要求似乎是比较容易实现的。比如要优化device placement，则runtime就可以作为一个很清晰的Reward；而runtime的结果可以通过计算任务在实际系统上运行获得。这也是为什么上述文章中专门提到Google的Device Placement的尝试是比较成功的，“I know there’s some neat work optimizing device placement for large Tensorflow graphs (Mirhoseini et al, ICML 2017).”
“The Case for Learned Index Structures”（https://arxiv.org/abs/1712.01208）
Indexes are models: a B-Tree-Index can be seen as a model to map a key to the position of a record within a sorted array, a Hash-Index as a model to map a key to a position of a record within an unsorted array, and a BitMap-Index as a model to indicate if a data record exists or not. In this exploratory research paper, we start from this premise and posit that all existing index structures can be replaced with other types of models, including deep-learning models, which we term learned indexes. The key idea is that a model can learn the sort order or structure of lookup keys and use this signal to effectively predict the position or existence of records. We theoretically analyze under which conditions learned indexes outperform traditional index structures and describe the main challenges in designing learned index structures. Our initial results show, that by using neural nets we are able to outperform cache-optimized B-Trees by up to 70% in speed while saving an order-of-magnitude in memory over several real-world data sets. More importantly though, we believe that the idea of replacing core components of a data management system through learned models has far reaching implications for future systems designs and that this work just provides a glimpse of what might be possible.
“Device Placement Optimization with Reinforcement Learning”（https://arxiv.org/abs/1706.04972）
The past few years have witnessed a growth in size and computational requirements for training and inference with neural networks. Currently, a common approach to address these requirements is to use a heterogeneous distributed environment with a mixture of hardware devices such as CPUs and GPUs. Importantly, the decision of placing parts of the neural models on devices is often made by human experts based on simple heuristics and intuitions. In this paper, we propose a method which learns to optimize device placement for TensorFlow computational graphs. Key to our method is the use of a sequence-to-sequence model to predict which subsets of operations in a TensorFlow graph should run on which of the available devices. The execution time of the predicted placements is then used as the reward signal to optimize the parameters of the sequence-to-sequence model. Our main result is that on Inception-V3 for ImageNet classification, and on RNN LSTM, for language modeling and neural machine translation, our model finds non-trivial device placements that outperform hand-crafted heuristics and traditional algorithmic methods.
第三项是非常新的工作，其中把Prefetching中的地址预测看成是自然语言处理中“next-word or character prediction”的问题也是很有启发的。
“Learning Memory Access Patterns”（https://arxiv.org/abs/1803.02329）
The explosion in workload complexity and the recent slow-down in Moore's law scaling call for new approaches towards efficient computing. Researchers are now beginning to use recent advances in machine learning in software optimizations, augmenting or replacing traditional heuristics and data structures. However, the space of machine learning for computer hardware architecture is only lightly explored. In this paper, we demonstrate the potential of deep learning to address the von Neumann bottleneck of memory performance. We focus on the critical problem of learning memory access patterns, with the goal of constructing accurate and efficient memory prefetchers. We relate contemporary prefetching strategies to n-gram models in natural language processing, and show how recurrent neural networks can serve as a drop-in replacement. On a suite of challenging benchmark datasets, we find that neural networks consistently demonstrate superior performance in terms of precision and recall. This work represents the first step towards practical neural-network based prefetching, and opens a wide range of exciting directions for machine learning in computer architecture research.
有Eyeress团队的Vivienne Sze的talk：“Understanding the Limitations of Current Energy-Efficient Design Approaches for Deep Neural Networks”。她们的“ Tutorial on Hardware Architectures for Deep Neural Networks”还是目前为止对深度神经网络硬件最好的综述。另外，她还透露了一下Eyeress2的情况。
还有“Efficient Deep Learning Inference on Edge Devices”，“Stitch-X: An Accelerator Architecture for Exploiting Unstructured Sparsity in Deep Neural Networks”，“Mobile Machine Learning Hardware at ARM: A Systems-on-Chip (SoC) Perspective”。
“Learning Graph-based Cluster Scheduling Algorithms”，“Representation Learning for Resource Usage Prediction”，“Better Caching with Machine Learned Advice”, “Towards Interactive Curation & Automatic Tuning of ML Pipelines”，“SLAQ: Quality-Driven Scheduling for Distributed Machine Learning” ，“Distributed Shared Memory for Machine Learning”，“Learning Network Size While Training with ShrinkNets”
“DAWNBench: An End-to-End Deep Learning Benchmark and Competition”，“DNN-Train: Benchmarking and Analyzing DNN Training”
最后，还有一篇有趣的文章叫做“In-network Neural Networks”。其基本思想就是利用目前的网络设备中的可编程运算资源来实现神经网络的应用。这个和我之前的文章“AI芯片开年 ”中提到的直接在网络设备中加速AI应用的想法是类似的。另外，在之前的WMC会议上，Nokia也发布了他们5G基站芯片“ReefShark”，强调了其AI计算能力，号称要让运营商的网络成为最大的AI计算平台。在大量数据需要本地处理的趋势下，从端设备到云端的整个网络中间，各种节点都可能越来越多的增加AI处理能力，在离数据最近的地方完成对数据的处理。